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Foreword 


Welcome to the LAC2014 in Karlsruhe! 

Around eleven years ago the Advanced Linux Sound Architecture (ALSA) was merged 
into the mainline Linux Kernel. It was introduced in the 2.5 development kernel and 
marked a major milestone in the GNU/Linux audio world. 

In the wake of this, and after a small audio-focused event at the LinuxTag, a group of 
developers teamed up with enthusiasts with the goal to organize the first Linux Audio 
Conference: An event to exchange knowledge and foster discussions surrounding Linux 
kernel-based systems and allied libre software for audio-related work. 

The LAC was born and its first instalment was hosted at the ZKM in March 2003. 

Due to its success it became an annual event with the first four conferences 2003-2006 
taking place at the ZKM after which the conference started traveling, first nationally, 
then internationally, evolving and growing with each iteration. 

Fast-forward 11 years in time, we can say without false modesty that the endeavour 
thrived. It grew out of a small group of developers into a conference attracting artists 
and academics alike. The number of projects that were initially conceived at LACs over 
the years only underlines the importance of personal face-to-face meetings or discus¬ 
sions which is facilitated by a conference like the LAC. 

While many things have changed since, we are pleased that the fundamentals of the 
LAC laid down by the first organizers have survived, in particular scientific rigour. As 
opposed to many other similar Linux developer or hacker events, the LAC is one of the 
few that retains a peer-review, and these printed proceedings emphasize the impor¬ 
tance of written documentation, particularly in a field where interface definitions as 
well as signal processing mathematics are important. 

As for new additions, LAC’14 will pick up on last year’s schedule which featured a well 
received day-off with an excursion to the countryside. Sunday, 4 May this year is re¬ 
served for a trip to the nearby city of Bruchsal and its Chateau, which hosts the German 
Museum of Mechanical Musical Instruments. 

Similarly the dedicated slot for lightning talks reappears on the schedule, an oppor¬ 
tunity for developers to pitch ongoing projects and present their works in a less for¬ 
mal session. A new addition to the LAC’14 program is the poster session, which we are 
confident to prove useful to complement the paper presentations taking place over the 
course of three days. 

We are excited about the Linux Audio Conference 2014, featuring a tightly packed, di¬ 
verse schedule with 77 events by over 100 persons! Five full evening concerts and 25 
presentations in just three days. Even though the schedule is tightly packed, it was a 
tough decision for the music jury and paper review committee to choose from an even 
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greater number of excellent submissions taking into account the inevitable limitations 
in time and resources. 

The 12th anniversary this year marks the ‘LAC coming home’ to ZKM. We hope to pro¬ 
vide a pleasant experience to all conference participants and wish that you feel at home 
here, as well. 

We would like to express our gratitude to everyone involved with the conference and 
particularly like to thank Linus van Geuns for kindly lending us his video streaming 
server hardware. 

Most importantly we would like to thank you, the worldwide Linux Audio Community. 
Neither the conference nor Linux would work if it was not for the presence of such a 
dedicated group. 

Have a great time in Karlsruhe! 


Ludger Brummer, Gotz Dipper, Robin Gareus, 
Frank Neumann and Jochen Arne Otto 
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A TouchOSC MIDI Bridge for Linux 


Albert GRAF 

Computer Music Research Group 
Institute of Art History and Musicology (IKM) 
Johannes Gutenberg University (JGU) Mainz, Germany 
aggraef@gmail. com 


Abstract 

Mobile applications such as hexler’s TouchOSC of¬ 
fer a cheap and convenient alternative to traditional 
controller hardware for computer music programs. 
TouchOSC is available for Android and iOS devices 
and supports both OSC and MIDI, two widespread 
standards for transmitting control data between 
computer music applications. On the host side the 
TouchOSC MIDI Bridge is required for MIDI sup¬ 
port, which unfortunately is proprietary software 
and only available for Mac and Windows systems. 
This paper presents pd-touchosc, a library of Pd ex¬ 
ternals which aims to bring most of the functionality 
of the TouchOSC MIDI Bridge to Linux. 

Keywords 

TouchOSC, controller, OSC, MIDI 

1 Introduction 

Any reader familiar with the area of computer 
music will have heard of JazzMutant’s Lemur 
controller [5], a big multitouch device with 
built-in OSC 1 and MIDI 2 support, which was 
fully configurable using a kind of GUI builder 
for control surfaces. Nowadays, the Lemur’s 
place is taken by mobile apps running on mod¬ 
ern (and much cheaper) devices such as smart¬ 
phones and tablets. (It is no accident that 
the demise of the original Lemur hardware was 
brought about by the advent of the iPad.) The 
Lemur lives on as a mobile app on iOS 3 , and 
there are other similar apps on both Android 
and iOS. 

One of these is hexler’s TouchOSC [4]. While 
it lacks some of the Lemur’s more advanced fea¬ 
tures such as physical models and scripting ca¬ 
pabilities, it certainly offers enough features to 
create fairly sophisticated interfaces and is also 
much cheaper. It comes with its own graphi¬ 
cal layout editor (which is written in Java and 
thus runs on Linux just as well as on Mac and 

1 http://opensoundcontrol.org/ 

“http://www.midi.org/ 

J http://liine.net/en/products/lemur/ 


Windows). Like the Lemur, TouchOSC sup¬ 
ports both OSC and MIDI and the layout of the 
controller elements is fully configurable, so that 
the user can tailor the graphical interface to 
the computer music application at hand. This 
sets it apart from applications like TouchDAW 4 
which provide fixed interfaces usually inspired 
by existing MIDI controller designs. There are 
other apps similar to the Lemur and Touch¬ 
OSC, such as OSCPad 5 which is more or less 
compatible with the TouchOSC layout format, 
and Charlie Roberts’ open-source app Control 6 
which features its own JSON format and is both 
scriptable and extensible using JavaScript. But 
among these TouchOSC seems to be the most 
mature and popular option right now, not least 
because of its graphical layout editor. 

One of the downsides of TouchOSC for Linux 
users, however, is its MIDI support which re¬ 
quires either the TouchOSC MIDI Bridge pro¬ 
gram or an RTP-MIDI 7 interface on the host 
side, neither of which is readily available on 
Linux. (The TouchOSC MIDI Bridge is closed 
source software only available for Mac and Win¬ 
dows, and drivers for RTP-MIDI are hard to 
find for Linux these days. Moreover, the RTP- 
MIDI protocol doesn’t seem to be supported in 
the Android version of the TouchOSC app any¬ 
way.) So the author set out to create a Touch¬ 
OSC MIDI bridge replacement for Linux, which 
is what this paper is about. 

Why would you want to use MIDI with 
TouchOSC anyway? It is true that MIDI is a 
much more limited format for control data than 
OSC, but you may find the conversion fronr/to 
MIDI convenient when interfacing TouchOSC 
to existing MIDI applications and hardware, 
such as synthesizers, algorithmic composition 
and music notation software, as well as DAW 

4 http://www.hurnatic.de/htools/touchdaw/ 

’http://burnsmod.com/software/oscpad.html 

( ’http : //charlie-roberts. com/Control/ 

1 http://www.cs.berkeley.edu/~lazzaro/rtpmidi/ 
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(digital audio workstation) and sequencer pro¬ 
grams. In particular, the conversion enables 
you to record control and automation data with 
DAW and sequencer programs, which typically 
offer good facilities for recording, playing back 
and editing MIDI sequences, but often provide 
only limited support for OSC, if at all. 

So there are plenty of use cases for TouchOSC 
MIDI on Linux. Given that neither the pro¬ 
prietary TouchOSC Bridge protocol nor RTP- 
MIDI will work for our purposes, the most 
straightforward solution is to just take the MIDI 
mapping information available in TouchOSC 
layout files and convert OSC messages to/from 
MIDI in an automatic fashion using that in¬ 
formation. This approach obviously has some 
shortcomings when compared to the “official” 
TouchOSC MIDI Bridge which connects di¬ 
rectly to the TouchOSC app on the device. In 
particular, it requires a working OSC connec¬ 
tion and that the TouchOSC layouts on the 
device and the host side match up. But this 
doesn’t seem to be much of an impediment, and 
in any case it is better than not having any 
MIDI connectivity at all. 

Our current implementation of the Touch¬ 
OSC MIDI bridge for Linux uses Miller Puck- 
ette’s Pd a.k.a. Pure Data 8 , an interactive vi¬ 
sual programming environment for computer 
music and multimedia applications. This makes 
it easy to create a working prototype of the soft¬ 
ware and also opens up the interface so that 
users can modify details of the implementation 
inside Pd. However, the core code of our solu¬ 
tion (which is written in the author’s Pure pro¬ 
gramming language [1]) could certainly be mas¬ 
saged into a stand-alone program which works 
outside the Pd environment. 

2 TouchOSC Layouts 

Let us begin with a brief overview of TouchOSC 
layouts. For further details we refer the reader 
to the documentation available at the Touch¬ 
OSC website [4], 

Layouts are created with the TouchOSC edi¬ 
tor which can store them in zipped XML files or 
transfer them directly to a TouchOSC instance 
running on a device. 

Figure 1 shows one page of a typical layout, 
as it is rendered on a device (an Android tablet 
in this case). When creating a layout with the 
editor, the user can choose from a built-in col¬ 
lection of various GUI widgets such as faders, 

s http://puredata.info/ 



Figure 1: TouchOSC layout. 


rotary controls, push and toggle buttons and 
XY pads. These can be placed freely on the 
screen. A layout may consist of multiple pages 
which can be selected using the tabs at the top 
of the screen. 

Each TouchOSC widget has one or more OSC 
messages associated with it, which are emitted 
when the status of the widget changes in some 
way (button pressed, fader moved, etc.). Con¬ 
versely, OSC messages can also be transmitted 
to the device in order to change the current 
value of a widget. The following status vari¬ 
ables are supported by most widgets: 

• x, y: x is the primary value of a control, 
such as the value of a fader or a rotary, or 
the status of a button (0 = off, 1 = on). 
XY pads have a secondary y value, so they 
encode two values x and y (position of the 
control along the x and y axis, respectively) 
at the same time. These variables are for 
both input and output, i.e., they are trans¬ 
mitted to the host in response to a touch 
event, but the host can also send them back 
to the device in order to change the value. 
The latter is useful for setting up presets 
or providing visual feedback for some host- 
side operations on the device. 
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OSC message 

Meaning 

/I 

first page 

/1/faderl 0.1 

value of fader 1 on the first page 

/1/faderl/color red 

color of fader 1 (input only) 

/1/faderl/z 1 

touch variable of fader 1 (output only) 

/1/xyl 0.1 0.7 

x, y values of a XY pad 

/1/multifaderl/1 0.1 

value of the first subcontrol of a multi-fader 

/1/multifaderl/l/z 1 

the subcontrol’s z value 

/1/multixyl/l 0.1 0.7 

x, y values of the first subcontrol of a multi-XY pad 

/l/multipushl/2/3 0.1 

value of the subcontrol in column 2, row 3 


Figure 2: TouchOSC message examples. 


• z is the touch variable which can be either 
1 if the widget is being touched or 0 oth¬ 
erwise. In the case of a touch button this 
will be the same as the primary value of 
the widget, but this value is output-only (it 
cannot be changed by transmitting a cor¬ 
responding OSC message to the device). 

• c is the color variable. TouchOSC offers 
a built-in palette of nine different colors 
which are usually set when editing the lay¬ 
out. But the color can also be changed dy¬ 
namically by transmitting a corresponding 
OSC message to the device. This variable 
is input-only. 

The OSC address of a widget can either be 
assigned automatically by the TouchOSC editor 
(in which case it takes the form /n/ widget-name 
where n is the number of the page on which the 
widget is located) or the user can set it manually 
to any valid OSC address string. This address is 
used for the primary widget value(s) (x and y), 
whereas the z and c values are specified by tack¬ 
ing on /z or /color to the OSC address. The 
OSC ranges of the numeric values are usually 
0-1 by default, but this can be adjusted in the 
editor. The color variables have symbolic values 
such as red, green etc. in the OSC encoding. 

Faders, buttons and XY pads also have multi¬ 
widget variations which consist of multiple con¬ 
trols of the same type making up a single wid¬ 
get. In this case the individual controls have 
separate OSC addresses of the form /widget- 
addr/i with an index i ranging from 1 to the 
number of subcontrols, or (in the case of multi¬ 
button widgets) /widget-addr/i/j where i de¬ 
notes the column and j the row index (note 
that the column index comes first, even though 
TouchOSC arranges the subcontrols in row- 
major order internally). 

Layout pages themselves also have an OSC 


address (by default, this will be simply /1, /2, 
etc.). A message with just the OSC address 
(without any parameters) will be emitted when¬ 
ever the page is clicked in the tab strip, and the 
host can also send a message of this form to 
change the page that’s currently displayed on 
the device. 

Figure 2 summarizes the syntax of typical 
TouchOSC messages and their meaning. 

3 MIDI Assignments 

The TouchOSC editor allows MIDI messages to 
be assigned to any status variable of a widget 
in a layout. The details are a little intricate 
at first because of the distinct characteristics 
of the various widgets and the different kinds 
of MIDI messages, but work in rather intuitive 
and straightforward manner once the user is fa¬ 
miliar with the available configuration options. 
TouchOSC supports all the different types of 
voice messages MIDI has on offer, as well as 
the sequencer-related system real-time messages 
(start, stop and continue). 

The start, stop and continue messages offer 
no further configuration options. They can only 
be assigned to on/off variables, i.e., the primary 
value of buttons, or the touch value of any con¬ 
trol. In our implementation, this kind of MIDI 
message is triggered whenever the correspond¬ 
ing control variable goes to a non-zero value. 9 

Voice messages generally map a control vari¬ 
able to the last data byte of the message. This 
will be the note velocity or control value for 

9 Note that, in contrast, the official TouchOSC MIDI 
Bridge seems to emit the message for each status change, 
i.e., also when the variable drops back to zero. We do 
not consider this behavior very useful, however, as it 
causes a sequencer message to be sent twice when press¬ 
ing and releasing a push button. Nevertheless, there’s 
a compilation time option in our code which provides 
compatibility with the TouchOSC MIDI Bridge in this 
respect if this is needed. 
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No. 

Type 

Channel 

Fixed Data Value 

Mapped Data Range 

0 

control change 

1-16 

controller number (0-127) 

controller value (0-127) 

1 

note 

1-16 

note number (0-127) 

velocity (0-127) 

2 

program change 

1-16 

- 

program number (0-127) 

3 

start 

- 

- 

- 

4 

stop 

- 

- 

- 

5 

continue 

- 

- 

- 

6 

key pressure 

1-16 

note number (0-127) 

velocity (0-127) 

7 

channel pressure 

1-16 

- 

velocity (0-127) 

8 

pitch bend 

1-16 

- 

pitch bend (0-16383) 


Figure 3: TouchOSC MIDI mappings. 


voice messages having two data bytes, and the 
single data byte of channel pressure (aftertouch) 
and program change messages. The pitch bend 
message gets special treatment; in this case the 
value of the control variable is mapped to the 
entire 14 bit range of 0-16383. (In MIDI this 
value is the combination of the two data bytes 
of the message, hence the 14 bit value range.) 

Considering a variable with the source (OSC) 
range X\-X2 and the target (MIDI) range yi-t/ 2 , 
the variable (OSC) value x is mapped to: 

y = yi + (y 2 - yi )———• 

X 2 ~ X\ 

In the case of the default OSC value range 
(x\ = 0, X 2 = 1), this can be simplified to: 

y = yi + (y2- yi)x. 

The resulting value y is then rounded to an in¬ 
teger and clamped to the MIDI data byte range 
(or the 14 bit range for pitch bend messages). 
By default, y\ = 0 and y 2 = 127 ( 2/2 = 16383 
for a pitch bend message). 

For voice messages the user may configure 
the (MIDI) value range for the control variable, 
the (fixed) value of the MIDI channel and the 
(fixed) value of the first data byte of the mes¬ 
sage, if any. 

Figure 3 summarizes the MIDI conversions 
supported in the latest TouchOSC version. The 
MIDI message type numbers in the first col¬ 
umn are as given inside the XML layout file. 
(TouchOSC uses its own encoding for the mes¬ 
sage types which has nothing to do with the 
actual MIDI status bytes of these messages.) 

Note that while it’s possible to map a variable 
to the velocity of a note or the value of a con¬ 
trol change message, you cannot map it to the 
note number or MIDI controller number. While 
this kind of setup might occasionally be useful, 
TouchOSC doesn’t allow it. Still it’s possible to 


implement most kinds of controller configura¬ 
tions, such as mixer interfaces, DJ controls and 
even MIDI keyboards without much trouble. 

The (OSC) source value for a MIDI control 
can be any of the x, y, z and c status vari¬ 
ables. In a multi-control widget, each of the 
subcontrols has its own MIDI assignments. The 
TouchOSC editor allows you to pick those from 
a dropdown list in the MIDI properties (in the 
left side pane of the editor) after selecting a wid¬ 
get. 

Note that it’s possible to map the color (c) 
variable as well, so that you can change the color 
of a widget by sending a corresponding MIDI 
message. In this case the MIDI value range is 
fixed at 0-8, where 0 denotes red, 1 green, etc. 
Another special case that deserves mentioning 
are mappings of the page messages (/l, /2, etc. 
in OSC). You can map any of the MIDI voice 
messages to a given page, so that a MIDI mes¬ 
sage will be emitted if the user switches tabs 
on the device, and the current page will be 
switched when the MIDI message is sent to the 
device. 

Our implementation fully supports all types 
of MIDI assignments described above. Note, 
however, that in order to receive touch mes¬ 
sages (z variable), the corresponding OSC mes¬ 
sage type must be enabled in TouchOSC’s OSC 
conhguration dialog. 

4 Interfacing TouchOSC and Pd 

Our TouchOSC MIDI Bridge does its job by 
converting OSC messages from/to MIDI and 
thus a working OSC connection between Pd 
and the device running TouchOSC is required. 
While Pd doesn’t offer any built-in OSC sup¬ 
port, this can easily be added by means of cor¬ 
responding Pd externals (plugins). Two well- 
known external libraries for this purpose are 
OSCx and mrpeach. These are both included in 
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Pd distribution packages such as Pd-Extended 10 
and Pd-L20rk 11 . The mrpeach externals offer 
additional features, such as the ability to access 
the source address of an incoming OSC message 
which is useful in order to set up bidirectional 
communication in an automatic fashion. Our 
sample patches employ this feature and are thus 
written using the mrpeach externals. 

To make these facilities available, you only 
need to make sure that you have the mrpeach 
externals installed and on your Pd library 
search path. This should already be the case 
if you’re running Pd-Extended or Pd-L20rk. 
The only other required setup is to verify your 
TouchOSC configuration on the device. In the 
OSC setup, the host address should be set to 
the IP address of the computer running Pd (un¬ 
der Linux, you can find this by running the 
if conf ig program). You should also verify that 
the outgoing and incoming ports in the Touch¬ 
OSC configuration are set to 8000 and 9000, 
respectively, since these are the default values 
our sample patches assume. These port num¬ 
bers match the TouchOSC defaults, however, 
so chances are that you only need to enter the 
correct host address on the device. 12 

5 The MIDI Bridge 

Our TouchOSC MIDI bridge is distributed 
in the form of a Pd external library called 
touchosc which implements two objects tomidi 
and toosc. Both objects take the name of a 
TouchOSC layout as their single creation argu¬ 
ment. Thus, for instance, to have OSC messages 
converted to MIDI using the MIDI assignments 
in a layout named sample .touchosc, you’d 
create the object in Pd as tomidi sample. 
To make this work, the layout file needs to 
be in the same directory as the Pd patch 
containing the object. You can also use 
a full path name including the .touchosc 
extension, enclosed in double quotes, as in 
tomidi "/some/path/sample.touchosc". 

The operation of the Pd objects is fairly 
straightforward and doesn’t require any addi¬ 
tional configuration. Both objects provide a 
single inlet and a single outlet. The tomidi 

1() http: //puredata. inf o/downloads/pd-ext ended 

n http://puredata.info/downloads/Pd-L20rk 

12 TouchOSC also supports Zeroconf, which is imple¬ 
mented in the latest versions of our software as well. 
This makes it much easier to set up the network con¬ 
nections, see Section 6. But if necessary you can also 
configure the network connection by manually entering 
IP addresses and port numbers as explained above. 


Message Type 

Format 

control change 

ctl v n c 

note 

note n v c 

program change 

pgm n c 

key pressure 

polytouch v n c 

channel pressure 

touch v c 

pitch bend 

bend v c 

start 

start 

stop 

stop 

continue 

cont 


Figure 4: MIDI representation of the Pd Touch¬ 
OSC bridge, n denotes the note or controller 
number, v the velocity or controller value, c the 
MIDI channel number. 

object takes OSC messages on its inlet and pro¬ 
duces the corresponding MIDI messages on its 
outlet. The toosc object does the reverse op¬ 
eration, mapping MIDI messages to their OSC 
counterparts. The conversion is fully automatic 
once you’ve configured the MIDI mappings in 
your TouchOSC layout. No manual processing 
of OSC messages is required. 

OSC messages are represented in the same 
symbolic format that’s also used by the OSCx 
and mrpeach externals, so the output of 
unpackOSC (mrpeach) or dumpOSC (OSCx) can 
be piped directly into tomidi, while the output 
of toosc is ready to be used with mrpeach’s 
packOSC or OSCx’s sendOSC. 

MIDI messages are also encoded in a sym¬ 
bolic format, i.e., as Pd rneta messages. As Pd 
doesn’t have a standard representation of MIDI 
messages other than as a numbers graveyard, we 
invented our own, but it’s fairly straightforward 
if you’re familiar with Pd’s objects for MIDI in¬ 
put and output. A summary of the message 
syntax can be found in Figure 4. The format 
and, in particular, the somewhat idiosyncratic 
order of arguments has been designed so that 
it’s easy to dispatch on the different message 
types using a Pd route object and pass the re¬ 
maining data to the corresponding MIDI output 
objects. Conversely, AUDI messages can be re¬ 
ceived from Pd’s MIDI input objects and con¬ 
verted to our format by just packing together 
the data and tacking on the proper message 
selector. Two helper patches midi-input and 
midi-output are included in the distribution to 
do this. 

The distribution also includes a helper patch 
named touchosc-bridge (cf. Figure 5) which 
takes care of all the nitty-gritty details of set- 
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touchosc-bridge layout-file [ inport outport ] 
This patch requires the cyclone and mrpeach externals. 


8000 is the default input port, you can change this with the 
second creation parameter. 

loadbang 
£ =. — 
f $2 


9000 is the default output port, you can change this with 
the third creation parameter. 


sel 0 


j^ort $1 
udpreceive 8000 


loadbang 
£ = 
f $3 


sel 0 


inlet 

^ handle connect messages 
route connect disconnect 


0SC input 
inlet 



route connect 

£ 

sprintf connecting to \s:%d 

print touchosc-bridge 
print the client we're connected to 


Figure 5: touchosc-bridge patch (simplified version). 


ting up incoming and outgoing OSC connec¬ 
tions, and provides a pair of tomidi and toosc 
objects to handle conversions in both directions. 
We describe this in the following section, where 
we cover the installation and usage of the MIDI 
bridge with Pd. 

The latest versions of the touchosc library 
also include a third oscbrowser object which 
lets you discover available OSC clients, and can 
also publish its own OSC service using Zero- 
conf. This object is used to implement the Ze- 
roconf support in the touchosc-bridge patch, 
but will also be useful in its own right for Pd 
users who wish to implement OSC applications; 
please check the pd-touchosc sources [3] for de¬ 
tails. 

6 Installation and Usage 

Our TouchOSC MIDI bridge library for Pd is 
written in the author’s Pure programming lan¬ 
guage [1], so you first need to install the Pure 
interpreter along with the pd-pure, pure-stldict 


and pure-xml addon modules. The Pure web¬ 
site will tell you how to do this. Binary packages 
for various popular Linux distributions such as 
Arch, Fedora and Ubuntu are also available, as 
well as ports for Mac OS X and BSD systems. 
Note that the pd-pure module is required to run 
any Pure externals with Pd, and needs to be 
enabled in Pd; please check the pd-pure docu¬ 
mentation for details [2]. 

Next, to install our TouchOSC MIDI bridge, 
go find the pd-touchosc repository on Bit- 
bucket [3] and clone the repository, or down¬ 
load it as a zip archive and extract it on 
your hard disk. At the repository website 
you can also find detailed installation instruc¬ 
tions in the README.md file. Basically, you’ll 
have to chdir to the source directory and run 
make && sudo make install. If you have Pd 
and all the other requisite software installed, 
this should build the external library and in¬ 
stall it under your /usr/lib/pd/extra direc¬ 
tory, along with some helper patches and exarn- 
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pies. 13 Then fire up Pd and add touchosc to 
your startup libraries. Next time you start up 
Pd you should see a message in the Pd console 
showing that the touchosc library was loaded 
and registered with pd-pure. 

Last but not least, you’ll need TouchOSC, of 
course. You can grab the mobile app on Google 
Play or the iTunes Store and install it on your 
Android or iOS device. The TouchOSC editor 
can be downloaded for free on the TouchOSC 
website; it’s a Java program, so you need to 
have a suitable Java runtime installed to use it. 

The recommended way to run pd-touchosc 
is via the included touchosc-bridge helper 
patch. This patch sets up a pair of tomidi 
and toosc objects along with all the required 
OSC input and output machinery to connect 
with TouchOSC. The current version of the 
touchosc-bridge patch is depicted in Figure 
5. 14 The patch is normally invoked with a single 
creation argument, the TouchOSC layout to use 
(in the same format as described in the previ¬ 
ous section). Optionally, you can also configure 
the TouchOSC UDP ports by specifying these 
as the second and third argument. 

The patch also detects the source IP address 
of incoming OSC messages and connects its 
output to it, so that after sending some data 
from the device the reverse connection should 
also work in an automatic fashion. Note that 
TouchOSC has an option which makes it send 
out OSC /ping messages in regular time in¬ 
tervals. If you enable this option then the 
touchosc-bridge patch will automatically set 
up its output connection as soon as it receives 
the first /ping message from the device. 

Getting the network connections set up is 
even easier with Zeroconf. The latest version 
of pd-touchosc supports this via the Avahi Ze¬ 
roconf daemon available for Linux and other 
Unix-like systems. 15 If you have Avahi in¬ 
stalled and its daemon running, a client named 
pd-touchosc should show up in the host list in 
TouchOSC’s OSC configuration dialog. Click 
on that to have the network address and port 

13 This will work with vanilla Pd. For Pd-Extended 
and Pd-L20rk you’ll have to specify the Pd flavor us¬ 
ing the PD make variable, e.g.: make PD=pd-extended 
&& sudo make install PD=pd-extended. 

14 For the sake of clarity, the figure actually shows a 
simplified version of the patch, available in the distri¬ 
bution as touchosc-bridge-simple .pd, which doesn’t 
include Zeroconf support. The full version of the patch 
can be found in the distribution as touchosc-bridge .pd. 

1 ’http://avahi.org/ 


number filled in. On the host side, the full ver¬ 
sion of the touchosc-bridge patch offers the 
option to browse for available OSC services us¬ 
ing Zeroconf. There’s a toggle which lets you 
enable this; touchosc-bridge will then con¬ 
nect to the first OSC service available in the 
network (other than pd-touchosc itself). If 
there’s more than one such service, you can 
cycle through the available services with the 
other GUI controls of the patch; see Figure 
6. E.g., if you’re running the Android version 
of TouchOSC then you’ll have to look out for 
services named Android (TouchOSC) or similar 
(depending on which ZeroConf Name you chose 
in TouchOSC’s OSC configuration) and pick the 
one that you want. 

The left inlet/outlet pair of the patch is for 
sending MIDI messages to and receiving them 
from the device. In addition, the right in¬ 
let/outlet pair can be used to send and re¬ 
ceive untranslated OSC messages. If you con¬ 
nect the left inlet and outlet to the midi-input 
and midi-output patches provided in the dis¬ 
tribution, and route Pd’s MIDI inputs and out¬ 
puts to your MIDI devices and/or applications 
as needed, you should be able to set up Pd 
as a simple TouchOSC MIDI bridge with lit¬ 
tle effort. Or, if your computer music applica¬ 
tion is implemented as a Pd patch, you can use 
touchosc-bridge directly in your patch and 
hook it up to your existing control logic. 

Figure 6 shows how touchosc-bridge can be 
employed in a simple test patch. Several sam¬ 
ple patches and corresponding TouchOSC lay¬ 
outs are also included in the distribution. To 
get started, just download the sample layouts 
to your device, open the corresponding patches 
with Pd and kick the tires to see how things 
work. 

7 Conclusion 

We presented a TouchOSC MIDI bridge imple¬ 
mentation which works on Linux, running inside 
Miller Puckette’s Pd environment. This soft¬ 
ware allows you to convert between TouchOSC- 
formatted OSC and AUDI messages, following 
the AUDI mappings defined in a TouchOSC lay¬ 
out. It offers pretty much the same functional¬ 
ity as the official TouchOSC MIDI Bridge ap¬ 
plication, which is only supported on Mac and 
Windows at this time. The main differences to 
hexler’s bridge are that it requires an actual 
OSC connection to the device and the Touch- 
OSC layout file on the host side to work. 
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explicitly connect output 
‘connect 192.168.2.102 9000 



scripts. http: //puredocs .bitbucket. 
org/pd-pure.html, 2014. 

[3] A. Graf, pd-touchosc: TouchOSC MIDI 
Bridge for Pd. https : //bitbucket. org/ 
agraef/pd-touchosc, 2014. 

[4] hexler. TouchOSC: Modular OSC and 
MIDI control surface, http: //hexler. net/ 
software/touchosc, 2014. 

[5] JazzMutant. Multitouch controllers for 
audio production, live music and me¬ 
dia performance, http: //www. j azzmutant. 
com, 2014. 


Figure 6: Sample patch using the touchosc- 
bridge abstraction. 


To the author’s knowledge, this is the first 
(and at the time of this writing, the only) fully 
automatic TouchOSC MIDI bridge application 
on Linux. Compared to the “real” TouchOSC 
MIDI Bridge, it also has the advantage that it 
is available under an open source license and 
doesn’t rely on any proprietary and undocu¬ 
mented protocols, so it can easily be customized 
by the user. 

Future work should be directed towards turn¬ 
ing the software into a stand-alone program 
which can be run more easily by non-Pd users. 
In principle, it should also be possible to adjust 
our implementation to other OSC applications 
such as Control and the Lemur, although this 
will require some refactoring of the layout pars¬ 
ing code. 
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Abstract 

This paper introduces the LV2 Atom extension, a simple 
yet powerful data model designed for advanced control 
of audio plugins or other real-time applications. At the 
most basic level, an atom is a standard header followed 
by a sequence of bytes. A standard type model can be 
used for representing structured data which is meaning¬ 
ful across projects. This model is currently used by sev¬ 
eral projects for various applications including state per¬ 
sistence, time synchronisation, and network-transparent 
plugin control. Atoms are intended to form the basis 
of future standard protocols to increase the power of 
host:plugin, plugin:plugin, and UTplugin interfaces. 

Keywords 

LV2, plugin, data, state, protocol 

1 Introduction 

LV2 is a portable plugin standard for audio systems, 
similar in scope to LADSPA, VST, AU, and others. 
It defines a C API for code and a format for data 
files which collectively describe a plugin. The core 
LV2 API is similar in power to LADSPA, but exten¬ 
sions can support more advanced functionality. This 
allows the interface to evolve and accommodate the 
needs of real software as they arise. 

LV2 is supported by many applications, in¬ 
cluding Digital Audio Workstations like Ardour 
[Davis and others, 2014], hardware effects proces¬ 
sors like MOD [Ceccolini and Germani, 2013], and 
signal processing languages like Faust [Graf, 2013], 

One key piece of functionality LV2 adds to 
LADSPA is the ability to transmit events. This 
is most commonly used to communicate via MIDI 
[MID, 1983] for playing notes, selecting programs, 
etc. MIDI is nearly ubiquitous in musical equip¬ 
ment, but has significant limitations [Moore, 1988]. 
Many applications require a more powerful model 
to express and manipulate state. For example, 
“load sample /media/bonk.wav”, a typical op¬ 
eration in some audio software, can not be ex¬ 
pressed in standard MIDI. Other protocols like OSC 
[Wright, 1997] arc more powerful, but still designed 


around commands, which limits their applicability 
and ability to express structured data. 

This paper introduces the LV2 Atom extension 
[Robillard, 2012b], a simple yet powerful data 
model designed for advanced control of LV2 plu¬ 
gins or other real-time applications. Atoms serve 
both as a model for representing state, and a proto¬ 
col for accessing or manipulating it. This includes 
primitive values like numeric controls or file names, 
but the model-based approach allows developers to 
work with more sophisticated data as well. 

The key distinction between MIDI or OSC mes¬ 
sages and atoms is that atoms arc not just com¬ 
mands, but a general data format. This paper aims to 
show that building on the foundation of a solid data 
model is more elegant and powerful than command- 
based protocols. The idea is conceptually si mi lar 
to the popular use of JSON [Crawford, 2006] in the 
web community: define a data model for represent¬ 
ing arbitrary information, then construct messages 
within that data model. 

However, atoms arc not introduced to the exclu¬ 
sion of other protocols. In fact, MIDI messages arc 
transmitted to and from plugins as a particular type 
of atom. At the lowest level, atoms arc a sequence of 
bytes (or a chunk ) with a standard header. On top of 
this, a type model is defined which allows complex 
structures to be built from a few standard primitive 
and container types. This model has several advan¬ 
tages, including extensibility, support for round-trip 
portable serialisation, and natural expression in plu¬ 
gin data files. 

There arc two aspects to the LV2 atom specifica¬ 
tion: the low-level mechanics (Section 2) define the 
binary format of atoms and how they may be used, 
while the high-level semantics (Section 3) define a 
type model built upon this binary format. Using 
this model, projects can communicate meaningful 
structures at a conceptually high level, while the ac¬ 
tual mechanics involved arc simply the copying of 
small chunks. This approach to plugin control has 
many applications (Section 4) in current projects, 
which typically use the provided convenience APIs 
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for reading and writing atoms (Section 5) with ease. 
Ultimately, atoms arc intended to form the basis of 
future work (Section 6) designing standard proto¬ 
cols for advanced plugin control. 

2 Mechanics 

2.1 Atom Definition 

An LV2 atom is a 64-bit aligned chunk of memory 
that begins with a 32-bit size and type: 

typedef struct { 

uint32_t size; 
uint32_t type; 

} LV2_Atom; 

This atom header is immediately followed by the 
body which is size bytes long. Atoms are, by defi¬ 
nition, Plain Old Data (POD): contiguous chunks of 
memory that can safely be copied bytewise. 1 At the 
most basic level, this is all there is to atoms. 

Types arc assigned dynamically and not restricted 
to any fixed set. Developers can define new atom 
types, though all types are required to be POD. 
Any atom can thus be copied or stored, even by 
an implementation which does not understand its 
type. Among other advantages, this makes it pos¬ 
sible for hosts to transmit atoms between plugins 
without explicitly supporting each type used. Sim¬ 
ilarly, generic plugins like event routers, multiplex¬ 
ers, or delays, can work with any atom. Developers 
are free to send complex data between plugins, or 
between UIs and plugins, without being held back 
by lacking standards or host support. Section 3.3 
explains in detail how this decentralised extensibil¬ 
ity is achieved. 

Note, however, that atoms arc only POD by defi¬ 
nition, not necessarily portable: atoms may contain 
architecture-specific data like integers with native 
endianness. The atom specification includes a set 
of standard types which should be used where per¬ 
sistence or interoperability arc important (see Sec¬ 
tion 3.1). 

2.2 Communication via Ports 

Plugins can send or receive atoms via an AtomPort 
which (like any LV2 port) is either an input or an 
output. An AtomPort is connected directly to an 
LV2 Atom (just as a standard LADSPA or LV2 con¬ 
trol port is connected directly to a float). 

An AtomPort can be used with any atom type. 
Plugins can specify which types arc supported using 
the atom: bufferType property in their data files. 
Several types may be supported by a single port. 

1 Type 0 has been reserved for a special reference type, in 
case a need for non-POD communication arises in the future. 


Input logistics arc straightforward: the host con¬ 
nects the input to an atom before calling the plugin’s 
process() method. 

Outputs arc slightly trickier since the plugin must 
know how much space is available for writing, but 
atom types may have variable size. To resolve 
this, the host initialises the size field in the output 
buffer to the amount of available space before call¬ 
ing process (). Plugins read this value, then write 
a complete atom (including size and type) to the 
buffer before returning. For real-time support, as 
with audio, output buffer space is made available by 
the host before calling process (). By default, out¬ 
puts are given the same amount of space as inputs of 
the same type, but plugins that require more space 
can request larger output buffers ahead of time. 

Thus far, atoms have been described without re¬ 
ferring to specific types. An AtomPort can be con¬ 
nected to any value, but since plugins process sig¬ 
nals over a block of time, it is usually more use¬ 
ful for ports to contain many time-stamped atoms, 
or events. To achieve this, ports are connected to a 
Sequence atom. This is the mechanism commonly 
used by LV2 plugins to process streams of sample- 
accurate events (including MIDI) alongside audio. 
The following section describes the set of standard 
atom types, which includes primitives like Int and 
containers like Sequence. 

3 Semantics 

3.1 Atom Types 

The structure of the atom type model is 
similar to JSON values or ERLANG terms 
[Virding et al., 1996]: a few primitive types, and 
collections which can be used to build larger 
structures. The hierarchy of standard types defined 
in the atom extension 2 is shown in Figure 1. 

Primitives represent a single value, and do not 
contain other atoms. The simplest types arc prim¬ 
itives with fixed size, like Int. These types have 
a corresponding C struct defined in atom.h which 
completely describes their binary format, for exam¬ 
ple: 

typedef struct { 

LV2_Atom atom; 
int32_t body; 

} LV2_Atom_Int; 

The other Number types and Bool correspond to 
the C types with the same name, but have a precisely 
defined size on all platforms (32 and 64 bits). 

2 There is also a standard type for MIDI messages defined 
in the separate LV2 MIDI extension. 
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Figure 1: The Atom type hierarchy. Abstract types 
are dashed, and collections arc grey. 

A UR ID is a URI which has been mapped to a 
32-bit integer by the host. This facility allows URIs 
to be used conceptually, but with the performance of 
fixed-size integers (Section 3.3 explains the purpose 
of URIDs in more detail). 

Other primitive types have variable size. String 
and Literal represent raw strings and string lit¬ 
erals with a datatype or language, respectively. A 
Chunk contains an opaque chunk of data. 

Larger structures can be built from these prim¬ 
itives using collections. The most basic collec¬ 
tion type is Tuple, a heterogeneous series of atoms 
of any type. An Object is a set of properties, 
each with a URID key and a value of any type. 
Tuple and Object are analogous to arrays and ob¬ 
jects in JSON, or tuples and dictionaries in Python 
[van Rossum, 2010], respectively: universal con¬ 
tainers that can express almost any structured data. 

The remaining collection types arc essentially op¬ 
timisations for audio applications. A Sequence is, 
like Tuple, a series of atoms, but each is preceded 


with a time stamp. 3 A Vector is a series of fixed- 
length atoms with the same type and no headers, 
making the vector body a regular C array. Sound is 
a descriptive type, identical in format to a Vector 
of float, but explicitly representing a sample of au¬ 
dio. 

3.2 Portable Serialisation 

In addition to a binary format, each atom type has 
a portable serialisation. This allows implementa¬ 
tions (typically hosts) to convert atoms to and from 
text for portable storage, network transmission, or 
human readability. This format is used to describe 
atoms in the following sections, but it is important to 
keep in mind that plugins work with atoms in their 
native binary form. 

Most primitive types arc associated with XSD 
[W3C, 2004b] datatypes which define their textual 
format. Table 1 shows this mapping along with an 
example string. URID is omitted since the portable 
serialisation of a URID is a URI. 


Atom 

XSD 

Example 

Bool 

boolean 

true 

Chunk 

basc64Binary 

vu/erQ== 

Double 

double 

2.99e8 

Float 

float 

0.6180 

Int 

int 

-42 

Long 

long 

4294967296 

String 

string 

hello 

URI 

anyURI 

http://lv2plug.in/ 

Path 

anyURI 

/home/drobilla/ 


Table 1: Text serialisation for primitives. 


Containers and Literal have an abstract RDF 
[W3C, 2004a] serialisation which can technically 
be written in many formats. Here, the syntax of 
choice is Turtle [Beckett and Berners-Lee, 2011], 
which is used in LV2 data files. 

All containers have a portable serialisation, but 
this paper focuses on the use of Obj ect. The format 
for the other containers is omitted for brevity, but 
can be found in the LV2 atom specification. 

An object in Turtle begins with its ID, followed 
by properties separated with semicolons. A ter¬ 
minates the description. For example, an Object 
named eg:control with three properties can be 
written as: 

eg:control 

lv2:minimum 0.0 ; 
lv2:maximum 1.0 ; 
lv2:default 0.5 . 

3 Currently time stamps are always in samples, though other 
units are possible. 
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The ID and properties shown here are abbreviated 
URIs, for example, lv2: minimum is actually the 
URI http://lv2plug.in/ns/lv2core#minimum. A 
full Turtle document has prefix directives to define 
these precisely. 

Numbers are shown unquoted, which is valid but 
does not precisely map to Atom types (e.g. 1.0 could 
be a Float or a Double). To preserve type in a 
machine serialisation, explicitly typed literals like 
" 1.0" ~~xsd: float are used instead. 

3.2.1 Serialisation in Practice 

A text-based format for describing atoms facili¬ 
tates discussion, but is also useful in practice. The 
Sratom [Robillard, 2012c] library provides a sim¬ 
ple C API for lossless round-trip serialisation of any 
atom built from the standard types. This is used in 
several different scenarios: 

• Saving plugin state in sessions, which is sup¬ 
ported by many hosts. 


tested and released to the public without any binary 
compatibility issues. This flexibility allows LV2 to 
evolve to meet real developer needs with minimal 
friction. 

Note that URIs here are simply serving as global 
identifiers, and arc not required to actually resolve 
on the Internet. However, developers should use 
URIs in domains where they could host pages, since 
this avoids potential conflicts. 4 There is no need to 
own an entire domain, for example many plugins 
use URIs at popular project hosting sites. 

URI schemes other than HTTP may be used, but 
are not recommended. One advantage of HTTP is 
the ability to have URIs resolve to useful resources, 
particularly documentation. All standard LV2 URIs 
work this way, so documentation is often just a click 
away (follow the above lv2: minimum URI for an 
example). The LV2 distribution includes a tool, 
lv2specgen, which generates documentation for 
types and properties which arc defined in Turtle. 


• Jalv [Robillard, 2012a], a single-plugin host 
for Jack [Davis, 2001], can log all communi¬ 
cation between plugin and UI to the console. 
This is particularly useful for debugging. 

• Ingen [Robillard, 2014], a modular plugin host 
and plugin itself, has a UI that communicates 
to the engine exclusively via atoms. When 
running as a plugin, binary atoms are sent via 
AtomPort, but the UI can also run remotely 
by communicating over a TCP socket in Tur¬ 
tle. This way, UIs on different architectures 
can control the engine, including those written 
in non-C languages like Python or Javascript. 

3.3 URIs and Extensibility 

Types and properties are identified by URI. The 
benefit of URIs is that anyone can define new terms 
without needing to worry about clashes or cen¬ 
tralised coordination. 

In the context of LV2 atoms, this allows devel¬ 
opers to invent new types and properties without 
requiring “approval”. This freedom is particularly 
useful while developing new ideas, be they experi¬ 
mental, for internal use only, or intended for even¬ 
tual standardisation. 

For example, the previous sections use the 
lv2: minimum property, but suppose a plugin devel¬ 
oper additionally needs to describe a “sweet spot” 
for controls. There is no standard LV2 property for 
this concept, so the developer can define their own 
(e.g. http://drobilla.net/ns/sweetSpot), use it 
in their data files, implement host support if nec¬ 
essary, send it between plugins or between plugin 
and UI, and so on. The implementation can be 


4 Applications 

4.1 Time 

The most common use of objects to communicate 
between plugins and hosts is transport synchronisa¬ 
tion. To keep plugins updated with tempo informa¬ 
tion, hosts send an object with properties describing 
the current time and tempo, whenever changes oc¬ 
cur. 

Most hosts send updates that roughly correspond 
to Jack transport information, but with floating point 
beats instead of PPQN ticks, and a single floating 
point speed instead of only “rolling” or “stopped”. 
For example: 


[] 


a 

time:frame 
time:speed 
time:bar 
time:barBeat 
time:beatUnit 
time:beatsPerBar 
time:beatsPerMinute 


time:Position 
88200 ; 

0.0 ; 

1 ; 

0.0 ; 

4 ; 

4.0 ; 

120.0 . 


The “a” here is Turtle short-hand for “is a” or 
“type”, equivalent to the rdf : type property. 

4.2 UI Communication 

Atoms arc also useful for communicating with com¬ 
ponents other than the host. The most common of 
these in practice is communication between a plugin 

4 Inventing URIs under other domains without permission 
is inappropriate! 
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and a custom UI (which, in LV2, happens via ports). 
Many UIs need to perform more advanced opera¬ 
tions than is possible via float control ports. For 
example, a plugin may include an envelope with an 
arbitrary number of points, which a UI could control 
with messages like 

[] 

a eg:EnvelopeSegment ; 

eg:endX 1.6 ; 
eg:endY 0.5 ; 
eg:shape eg:linear . 

Several projects have made use of such messages 
for controlling plugins from custom UIs. While 
host-transparent (and thus automatable) control is 
preferable, full control of some plugins requires 
messages that arc not currently standardised (e.g. 
LV2 presently has no concept of envelope segments, 
or multi-dimensional controls in general). However, 
though the message does not have standardised se¬ 
mantics, it is built from standard atom types so that 
hosts can make some sense of it. In particular, hosts 
can serialise such messages for controlling a plu¬ 
gin running on a remote computer or embedded de¬ 
vice. This is a good example of how an extensi¬ 
ble model allows developers to achieve their goals 
without being held back by lagging standardisation 
or host support. 

In the future, standardised message types will al¬ 
low plugins and UIs to use event-based control with 
host support for friendly interfaces and automation, 
where appropriate. 

4.3 Plugin State 

LV2 has a state extension which allows plugins to 
save and restore state beyond control port values. 
The state extension does not directly depend on 
the atom extension, but has a property-based API 
that meshes naturally with Object. Plugins use 
host-provided callbacks to save/restore a URID key, 
void* value, and URID type. 

The fact that plugin state and Object arc both 
based on properties suggests an elegant approach to 
plugin design: one set of properties can serve both 
as plugin state and real-time control protocol. This 
means plugin developers do not need to design both 
a state model and protocol, but simply define a set 
of properties that describes their plugin’s state. 

For example, the sampler example plugin in¬ 
cluded with LV2 can play any .wav file, and the 
sample can be loaded by sending a message like: 


[] 

a patch:Set ; 

patch:property eg:sample ; 
patch:value </media/bonk.wav> . 

The patch: Set type and properties used here arc 
defined in the LV2 patch extension, which defines 
several message types for getting and setting prop¬ 
erty values. 

The eg: sample property is saved as paid of the 
sampler’s state. Thus, this single property is used 
to both control the plugin and represent a value in 
saved state. There is no need to define both a special 
“set sample” command and a format for saving that 
information. 

4.4 Properties 

Developers can invent new property URIs and use 
them in code without defining anything. However, it 
can be useful to define properties for documentation 
purposes, and in some cases host support. 

Properties arc defined in Turtle, so they can be in¬ 
cluded alongside plugin descriptions. For example: 

eg:sweetSpot 

a rdf:Property ; 

rdfs:domain lv2:ControlPort ; 

rdfs:range xsd:float ; 

rdfs:label "sweet spot" ; 

rdfs:comment "The nicest value." . 

Defining properties in this machine-readable for¬ 
mat is mainly useful for generating documentation 
(all standard LV2 properties arc defined in this way), 
but this information can be used by hosts as well. 

This area is still experimental, but for example, 
Jalv will show a file selector in its host-generated UI 
for plugins that support properties with Path val¬ 
ues. Setting the property is achieved by sending a 
patch: Set message like the example shown in the 
previous section. 

4.5 Presets and Default State 

Plugin descriptions can include a set of default state 
properties which should be loaded initially. A pre¬ 
set has a si mi lar structure to a plugin description, 
and can also include state. This means that presets 
can not only set port values, but restore arbitrary in¬ 
ternal plugin state like loaded samples. The ben¬ 
efit of using standard atom types to describe state 
is that developers can write default state in plugin 
data files, and hosts can serialise state/presets in the 
same format. For example, a preset for a sampler 
can specify a sample to load like so: 
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eg:clickeyPreset 

lv2:appliesTo eg:sampler ; 

# . . . 

state:state [ 

eg:sample <click.wav> 

] • 

5 Reading and Writing Atoms 

It’s convenient to think of atoms in high level terms, 
and describe objects in human-readable Turtle, but 
plugins arc typically written in C and must work 
with binary atoms. For simple primitive types like 
Int this is trivial: the appropriate structs can be cre¬ 
ated, copied, and read in the usual way. 

Collections arc more complex, since their bodies 
have variable size and possibly an irregular and/or 
nested structure. To make reading collections easier, 
iterators for each collection type arc provided in a 
utility header. 

For objects, iteration works, but is tedious and 
verbose in the typical case of getting a few property 
values. To make this case more succinct, a simple 
accessor for object properties is provided. For ex¬ 
ample, consider an object that describes a 2D point: 

[] 

a eg:Point ; 
eg:x 1.0 ; 
eg:y 2.0 . 

If obj points to this object, and ids contains the 
necessary mapped URIs, the eg: x and eg: y values 
can be accessed like so: 

const LV2_Atom* x = NULL; 
const LV2_Atom* y = NULL; 
lv2_atom_object_get(obj, 

ids.eg_x, &x, 
ids.eg_y, &y, 

0 ); 

Here, x and y are pointed directly at the corre¬ 
sponding values within obj. There is no dynamic 
allocation, so this code is real-time safe and does 
not require the user to clean up x and y. If the object 
does not have a matching property, the result will be 
NULL. Note that this code will continue to work cor¬ 
rectly even if additional properties arc added to the 
object in the future. 

Writing collections can be trickier, particularly 
those with nested structure. For example, an 
Object property may have a Tuple or another 
Object as a value. Atoms can be constructed 
in-place by repeatedly appending to a buffer, but 
correctly maintaining container size fields and 


padding requirements can be a delicate task. To 
make writing simple, a forge API is provided which 
allows arbitrarily complex atoms to be constructed 
in a target buffer. The forge has a method for each 
atom type: for primitives it simply appends the 
given value, and for containers it appends the atom 
header and returns a frame which must be popped 
when the object is finished. Container sizes arc up¬ 
dated automatically as atoms arc written using this 
stack of frames. The forge is safe to use in real-time 
code, and can be used by plugins to write objects 
directly to AtomPort outputs in their process() 
method. For example, the same 2D point object can 
be written like so: 

// Begin an anonymous eg .-Point object 

LV2_Atom_Forge_Frame frame; 
lv2_atom_forge_object( 

forge, &frame, 0, ids.eg_Point); 

// eg:x 1.0 

lv2_atom_forge_key(forge, ids.eg_x); 
lv2_atom_forge_float(forge, 1 . 0 ); 

// eg:y 2.0 

lv2_atom_forge_key(forge, ids.eg_y); 
lv2_atom_forge_float(forge, 2.0); 

// Finish object 

lv2_atom_forge_pop(forge, &frame); 

6 Conclusions and Future Work 

The LV2 Atom specification defines a simple binary 
format for any type of data, and an expressive type 
model for representing structured data within that 
format. This model has proven effective for repre¬ 
senting plugin state, host to plugin communication 
such as tempo synchronisation, and custom control 
protocols such as between a plugin and its UI. 

This work has laid the foundation for more pow¬ 
erful control of plugins and other real-time applica¬ 
tions. There are two main areas of future work: ad¬ 
ditional convenience APIs and tools to make work¬ 
ing with atoms as simple as possible, and building 
more advanced control protocols and other func¬ 
tionality using the atom model. 

For convenience, the existing APIs described in 
Section 5 do a relatively good job of making it easy 
to construct and inspect atoms in C. However, some 
developers have found the forge confusing. It is 
difficult to make a fully capable writing API much 
simpler given the constraints of C and hard real¬ 
time, but one idea is to make a writing counterpart 
to lv2_atom_object_get () which works only for 
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non-nested objects. Using C++, a similar, but more 
elegant and type-safe interface would be possible, 
which could work even for nested containers. LV2 
is defined in C, but a significant portion of the devel¬ 
oper community uses C++, so a C++ convenience 
wrapper (including idiomatic iterators) would be a 
welcome improvement. Other minor improvements 
could ease the mechanics, but since several develop¬ 
ers have successfully made use of atoms, focusing 
on this area may not be an effective use of time. 

The other, more interesting, area for future work 
is building on the foundation of atoms to create 
more powerful control protocols. One of the biggest 
limitations of LV2 is the ControlPort inherited 
from LADSPA. Control ports can only hold a sin¬ 
gle float value, and tie the control rate to how of¬ 
ten process () is called. This can be problematic 
for certain types of plugins. The lack of a mecha¬ 
nism for adding and removing ports also means that 
the set of controls is fixed, which prevents many 
possibilities such as the multi-point envelope ex¬ 
ample in Section 4.2. Using events for control in¬ 
stead of control ports can solve all of these prob¬ 
lems. Events arc much more powerful than a low- 
rate control signal, and allow a sample-accurate 
stream of changes to be sent to a plugin for an entire 
process () call. Logistically, this can be achieved 
via the current AtomPort + Sequence mechanism, 
but the structure of events required is yet to be de¬ 
termined. Object will likely form the basis for fu¬ 
ture standard messages, due to its inherent mean¬ 
ingfulness and extensibility. There arc many pos¬ 
sibilities opened up by moving to events, including 
ramped/smoothed controls, gestures, precise voice 
control, and note-specific modulation/articulation. 
This is one of the most exciting frontiers of LV2 de¬ 
velopment; a powerful event-based control scheme 
will enable new functionality beyond the current ca¬ 
pabilities of host-agnostic plugins. 
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Abstract 

The Muditulib library is introduced and explained. 
Muditulib is mainly a library consisting of a collec¬ 
tion of C header files that include functions written 
for the purpose of tuning tonal music within the di¬ 
atonic scale. This scale, as well as the library’s func¬ 
tions, along with pitch representation systems, will 
be explained in detail or just shortly with reference 
to other literature. A music theoretical background 
is useful, though not necessary. Along with the mu- 
ditulib core functions an implementation for Pure 
Data is published. Developers are encouraged to 
write implementations for other synthesizers, music 
production platforms or any other link in the chain 
of tonal music production workflow. 

Keywords 

Tuning systems, pitch representation / MIDI, soft¬ 
ware library. 

1 Introduction 

Muditulib is developed to make the tuning of 
the common western (diatonic) scale easier, 
without being restricted to equal temperament 
of twelve tones per octave with a frequency ra¬ 
tio of 2 : l 1 . Music theorists and mathemati¬ 
cians have developed many tunings for this scale 
through the ages. Modern software like for ex¬ 
ample SCALA 2 can map all possible tunings to 
MIDI notes. In a flexible environment like Pure 
Data 3 one could rather easily implement such 
mapping oneself, so that is not the purpose of 
the library. Muditulib doesn’t really map fixed 
scales to MIDI notes, but offers multiple meth¬ 
ods to tune notes or intervals more dynami¬ 

1 The standard equation for translating MIDI notes to 
frequency is / = 440- 2 < / m - 69 )/ 12 ) i where m is the MIDI 
note number and / is the frequency in cycles per second. 
In this tuning a diatonic semitone equals a chromatic 
semitone. Moreover, expressed as frequency ratios, a 
semitone equals the square root of a whole tone, thus, 
on a logarithmic scale, the semitone equals half a whole 
tone. 

2 http://www.huygens-fokker.org/scala/ 

J http://puredata.info/ 


cally. In that respect Muditulib is more familiar 
to the Hermode tuning system 4 , although the 
approach is quite different. Both SCALA and 
Hermode will not be further explained here, for 
that is beyond the purpose of this paper. The 
next sentence deserves its own emphasized para¬ 
graph. 

Muditulib has got nothing to do with mi¬ 
crotonality, microtonal music, or microtones, 
whatever may be meant by those obfuscating 
terms. 

This document is rather intended as an expla¬ 
nation of the software library Muditulib, along 
with a short summary of my research within the 
field of tuning and music theory, than purely 
as a genuine scientific article that describes re¬ 
search goals, methods, and conclusions. Its 
purpose is to propose several tuning and pitch 
representation systems to an audience of music 
software developers. 

2 The diatonic scale 

In order to understand the approach described 
here it will be helpful to explain a little bit 
about the diatonic system, particularly the dis¬ 
tinction of variable steps in a scale. This is done 
most easily by freely citing a recent work by the 
present author in the next two paragraphs [See- 
len, 2014], 

The terms chromatic and diatonic descend 
from the old Greek musical system. Together 
with the enharmonic they formed the three 
tetrachords the Greek musical scales were made 
up from [Grout and Palisca, 1988, ch. 1], The 
ancient tuning theory is clearly described by J. 
Murray Barbour [Barbour, 2004, ch. II], To¬ 
day’s use of those terms is somehow related to 
that of their namegivers, although the tetra- 
chord itself lost its value. The diatonic scale is 
a scale that consists of seven intervals or steps. 
The eighth note, or the octave, is a repetition 

4 http://www.hermode.com/index_en.html 
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of the first one, usually with a frequency ratio 
of 2 : 1. Those seven steps are divided into five 
larger ones, the whole tones, and two smaller 
ones, the semitones. The scale then created is 
actually the same as two Greek diatonic tetra- 
chords on top of each other, at one side overlap¬ 
ping ( conjunct ) and at the other side separated 
by one whole tone (disjunct). By the chromatic 
scale, however, usually a division of the octave 
in twelve equal parts is meant, which is quite 
different from an accumulation of Greek chro¬ 
matic tetrachords. 

So far I described the historical outlines of 
the system. In the frequency domain the re¬ 
lation between the octave x and whole tone T 
and semitone s is as shown in equation 1, where 
1 < s < T. 

x = T 5 • s 2 (1) 

The citation [Seelen, 2014] ends here. For 
further reading I refer to the mentioned arti¬ 
cle. The main point is that the tonal system 
used as a starting point for the tuning system 
is a 7-tone and not a 12-tone system, as west¬ 
ern tonal music is often incorrectly described as. 
This idea of a 12-tone system just evolved from 
practical tuning matters concerning the 7-note 
system. For clarity: the diatonic scale is a theo¬ 
retic scale rather than a scale of fixed frequency 
relationships 5 . 

3 Ts, a two-dimensional pitch 
representation system 

Traditional western music notation is based 
on seven syllables 6 or alphabetical characters 7 , 
which correspond to the graphical notes. In con¬ 
trast to a one-dimensional representation like 
MIDI note numbers, western music notation 
makes a clear distinction between for example 
a C-sharp and a D-flat. All theoretic tone in¬ 
tervals consist of a number of whole tones and 
semitones, instead of just a number of chromatic 
semitones 8 . The system can therefore be inter¬ 
preted as two-dimensional. In the previously 
mentioned article [Seelen, 2014] the Ts() tonal 
representation system is proposed, consisting of 

s E.g. the diatonic perfect fifth can be tuned to §, as 
well as and many other frequency ratios. 

6 do-re-mi-fa-so-la-ti 

7 C-D-E-F-G-A-B 

8 The word chromatic is emphasized for the reason 
that this term can be interpreted in various ways and is 
therefore confusing. In this case one twelfth of an octave 
is meant. 


the two values T n and s n , the number of whole 
tones and semitones, respectively. Its advan¬ 
tage is that it can be easily translated to the 
MIDI note system as is shown in equation 2. Its 
reference (Ts(0,0)) is set equal to the C corre¬ 
sponding to MIDI note 0. Therefore Ts( 25,10) 
corresponds to middle C (lilypond: c’). 

m = 2 • T n + s n (2) 



Figure 1: Examples of [midi2ts] and [ts2symbol] 
in Pure Data. 


3.1 Ts to note name symbol 

The translation of Ts(T n ,s n ) to a note name 
symbol (the Lilypond 9 standard) is done by 
translating the total number of steps (T n + s n ) 
to the root character plus octave designation 10 . 
Then the deviation from the reference of the 
root character is calculated and translated into 
an amount of flattening or sharpening. 

3.2 Ts to frequency 

This translation can be summarized to equa¬ 
tion 3, where / is the frequency, A is the refer¬ 
ence frequency for Ts( 29,11), corresponding to 
the note a\ x is the frequency ratio for the oc¬ 
tave, and r is equal to and represents the 
semitone to whole tone ratio. 

f = X ((T„-29)+(a„-ll)-r)/(5+2-r) . A ( 3 ) 

Usually x is set to 2. The ratio r then de¬ 
fines the kind of tuning. When r is set to about 
0.6 the tuning could be said to be within the 

!, http : //lilypond. org/ 

10 Division by seven and its remainder (modulo). 
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mean tone zone, r = 0.6 or r = | corre¬ 
sponds to 31-TET, for each semitone is made 
up from 3 and each whole tone from 5 dieses 11 . 
The exact meantone (T = ^\/5) temperament, 
Pythagorean, and other examples and how their 
parameters are calculated are shown in table 1. 


Tuning / 

Tempera¬ 
ment 

Equation / Calcu¬ 
lation 

Parameters 

Mean tone 

5 _ 2 ^/(5+2r) 

r ss 0.60628 

Pythagorean 

3 _ 2 (3+r)/(5-|-2r) 

r « 0.44247 

_ log s _ log H| 
logT log | 

Tritone tem¬ 
perament 

7 = 2 (3/(5+2r)) 

r « 0.59006 

Stretched 
octave, 
perfect 5th 
and 3rds 

5 / 3 \ 2/( 3 + r ) 

4 V 2 / 

r « 0.63412 

* = (| f +2ry2 

x « 2.01246 

19-TET 

s = 2m:, T = 2ra 

r= 1 

31-TET 

s = 2s, T = 

*• = ! 

53-TET 

s = 2®3, T = 2& 

r=i 


Table 1: Tuning examples of the two- 

dimensional system. 


3.3 MIDI note numbers to Ts 

Ideally the MIDI note system is skipped in all 
translations of tonal data. The translation from 
Ts to MIDI, from two to one dimension as 
shown in equation 2, leads to irreversible data 
loss. The same applies to MIDI files exported 
from Lilypond. However, even if composed dia¬ 
tonic music material wouldn’t be translated into 
MIDI anymore, still improvisations on MIDI 
keyboards should be interpreted by the com¬ 
puter. The simplest and probably best way of 
doing this is to leave the decision to the per¬ 
former. 


11 The diesis is the interval that remains to the octave 
after an accumulation of three perfectly tuned (|) ma¬ 
jor thirds (e.g. B-sharp to C). This typically mean tone 
interval remainder is approximately a 31 ^, ' of an octave. 
A.D. Fokker uses this term to indicate the smallest in¬ 
terval in 31-TET [Fokker and Pol, 1942] 


3.3.1 User-defined 

In Muditulib this is done by setting a modula¬ 
tion parameter {mod). The default ( mod = 0) 
is - as a starting reference - set to two flats {E 
and B ) and three sharps (T, C, and G), simi¬ 
lar to the baroque standard. Every modulation 
up replaces one note in the circle of fifths by 
adding (1,-2) to its assigned Ts value, start¬ 
ing at MIDI note 3 {E-flat to D-sharp ). In the 
opposite direction it starts at MIDI note 8 {G- 
sharp to A-flat). In the current implementation 
each modulation change is calculated from a de¬ 
fault array at ‘mod O’. 

3.3.2 Real-time pitch spelling 

Another way to enrich the poor MIDI note data 
is the algorithmic approach. If, during a per¬ 
formance, a listener is able to roughly extract 
information about key, mode or tonality, the 
computer should be able too, if programmed 
according to a realistic cognition model. The 
translation from MIDI note data to staff no¬ 
tation or note names is called pitch spelling 
and is generally not a real-time practice. Some 
researchers have developed algorithms through 
the last decades [Longuet-Higgins and Steed- 
man, 1971; Temperley, 2004; Cambouropoulos, 
2003; Meredith, 2003; Chew and Chen, 2005; 
Honingh, 2006]. A very related topic is key¬ 
finding. In the end such procedures are all about 
saying something about the function of and rela¬ 
tion between tonal events. For in some styles of 
music it is not always clear in which tonal direc¬ 
tion the music will develop, when no true sense 
of tonality in that certain moment is present, 
a perfect real-time solution is theoretically im¬ 
possible. The duty of the algorithm, however, is 
not offering perfect sheet music, but offering in¬ 
put for a real-time controlled dynamic tuning. 
Errors are acceptable, at least in cases where 
the human perception is uncertain. Any un¬ 
certain choice of the human tonal perception 
corresponds to the same uncertainty of the al¬ 
gorithm, ideally. An algorithm, developed by 
the present writer, based on memory, predict¬ 
ing, counting, averaging, and interval compar¬ 
ing, will be included in the library. 

4 Tts , adding a third variable 

Just intonation is a tuning approach in which 
all tone intervals are based on integer relation¬ 
ships. Pythagorean tuning can be considered 
‘just’. It is based on a perfect fifth (3 : 2) 
and a perfect octave (2:1) ratio. All inter¬ 
vals are then made up from powers of prime 
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numbers two and three 12 . As the mean tone 
temperament showed, however, the perfect ma¬ 
jor third ratio is 5 : 4, but adding the number 
five to the tuning system introduces a problem. 
A major third cannot be divided into two equal 
whole tones within just intonation, for the mean 
tone is not an interval based on an integer rela¬ 
tionship. Therefore, the major third is divided 
into a large and a small whole tone. This way, 
thirds, both major and minor, perfect fifths as 
well as octaves, and therefore all octave inver¬ 
sions of the mentioned intervals, can be tuned 
correctly 13 . 

4.1 Tts to frequency 

Tuning the Tts values is not a great deal, for 
there seems to be only one perfect solution, in 
which all octaves, perfect fifths, and thirds are 
tuned the most ideal way. Frequency ratios then 
should be T = |, t = and s = j|. The 
translation to frequency can be best summa¬ 
rized by equation 4, where A is the reference 
a’ at Tts{ 17,12, ll) 14 . 

16 \ ( Sn_11 ) 

157 

(4) 

However, still some adjustments are conceiv¬ 
able. The 53-TET system for example, with 
r = | and therefore approximately Pythagorean 
tuning, can be divided into three variable steps. 
Instead of a semitone of four commas 15 and a 
whole tone of nine, the semitones are enlarged 
to five in favor of two of five whole tones. Note 
that the result of this would not deviate very 
much from the in equation 4 proposed tuning. 

4.1.1 Turkish modes 

An idea for further improvement would be 
to combine both two- and three-dimensional 
interpretations of 53-TET, or the three-limit 
Pythagorean and the five-limit just intonation 
instead. This would result in even more vari¬ 
ables or steps, namely those used in Turk¬ 
ish modes ( makamlar ): bakiye (4 commas), 
kiigiik miicennep (5 commas), buyiik mucennep 

12 This is called three-limit. 

13 According to renaissance counterpoint prescriptions 
all intervals considered consonant [Mann, 1987], whether 
perfect or imperfect, are now covered. 

14 That is, NOT the MIDI note number 69 is the ref¬ 
erence. 

15 In contrast to 31-TET a part is called comma instead 
of diesis. This comma refers to the syntonic comma, 
which will be discussed later. 



(8 commas), tanini (9 commas), and the 
augmented second artik ikili (12 commas) 16 
[Signell, 1986 1977]. Muditulib could then be 
made very suitable for digitally synthesized re¬ 
production of Turkish classical music or any¬ 
thing alike. 

4.2 Creating useful Tts data 

In contrast to Ts data, Tts data cannot be ex¬ 
tracted from regular scores when it concerns di¬ 
atonic music. Tts values shall then be obtained 
from Ts or even MIDI. This raises the prob¬ 
lem of the syntonic comma , that is, the differ¬ 
ence between the large and the small whole tone 
(|^). This means that to fit the needs of a cer¬ 
tain interval, another interval might be tuned 
too wide. In a dynamic tuning this comma can 
be replaced on-the-fly. In a more fixed tuning 
the comma will stick to its initial place and be 
rather present. For example, from this point of 
view Turkish modes are based on the placement 
of syntonic commas, giving each rnakarn its very 
own character, based on some slightly wider and 
narrower intervals 17 . Only fifths and octaves 
are always tuned into perfection. One could 
possibly write a book about the placement of 
the syntonic comma. However, for this moment 
the present author prefers to skip such time- 
consuming research effort and focusses on two 
approaches, proposed in the next paragraphs. 
Again, there is a relatively simple approach and 
a more elaborate. 

4.2.1 MIDI to Tts, user-defined 

The simple approach is the user-defined key set¬ 
ting. The user defines the mode and the starting 
MIDI note. Currently two modes are available, 
one minor and one major, both displayed in ta¬ 
ble 2. Different modes can be created by moving 
the pattern to another reference MIDI note or, 
of course, by editing the source code or submit¬ 
ting a supported feature request 18 . For a piece 

16 The mentioned augmented second, that is, a per¬ 
fect fourth (22 commas) minus two large semitones (5+5 
commas), is clearly not unique here and can not be con¬ 
sidered an extra variable. The only difference is that 
both Pythagorean (small) and five-limit (large) semi¬ 
tones appear. That makes a total of four different steps. 

17 An example of this is the u§§ak makam, starting 
with the relatively small Pythagorean minor third (13 
instead of 14 commas), although consisting of a small 
large whole tone (8) and a large semitone (5) from the 
tonic [Signell, 1986 1977]. 

18 The pattern is best recognized by looking at the ‘dif¬ 
ference’ column. The non-steps (places where no semi¬ 
tone or step occurs, e.g. ‘T/s’) are grouped just like the 
black keys on a keyboard. 
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in E minor one would usually choose ‘MIDI note 
4’ as reference and ‘O’ (minor) as mode. These 
default patterns are carefully chosen to enable 
the best standard modulations from the refer¬ 
ence key, without resulting in too many ‘mis¬ 
placed’ commas 19 . How these patterns were ac¬ 
tually chosen is not discussed here, for the sake 
of not going into detail of music theoretical con¬ 
siderations too much. Furthermore, more re¬ 
search on this topic would be desirable, for ex¬ 
ample comparing these considerations to those 
of how frets on a saz are placed. 


Minor (0) 

Major (1) 

Dif. 

Total 

C. 

Dif. 

Total 

C. 

s 

- 

0 

s 

- 

0 

s 

s 

5 

T/s 

T/s 

4 

T/s 

T 

9 

s 

T 

9 

s 

Ts 

14 

s 

Ts 

14 

t/s 

Tt 

17 

t/s 

Tt 

17 

s 

Tts 

22 

s 

Tts 

22 

T/s 

TTt 

26 

T/s 

TTt 

26 

s 

TTts 

31 

s 

TTts 

31 

s 

TTtss 

36 

s 

TTtss 

36 

T/s 

TTTts 

40 

t/s 

TTtts 

39 

s 

TTTtss 

45 

s 

TTttss 

44 

t/s 

TTTtts 

48 

T/s 

TTTtts 

48 


Table 2: Tuning patterns for the MIDI key¬ 
board: modes and modulation options from a 
reference tonic. Shown are the differences to 
the previous MIDI note and the total amount 
of distance to the reference in symbols and in 
commas. 

4.2.2 A pattern matching approach, or 
the hexahord analysis algorithm 

Another way of tuning is leaving this task 
to, again, a real-time controlling algorithm. 
For singers in the Middle Ages used Guido 
of Arezzo’s hexachorrl to choose their pitches 
[Grout and Palisca, 1988], the hexachord seems 
very suitable for on-the-fly tuning purposes. 
The next question is how the hexachord should 
then be tuned. Hermann von Helmholtz has 
been very helpful to answer this question for he 
explains how medieval singers related each note 
of the hexachord to a reference and what differ¬ 
ences exist between major (on Ut ) and minor 
(on Re) modes [Helmholtz, 1896, ch. 18]. The 

19 E.g. the fourth of 23 commas on the subdominant 
in minor mode (from Tts to TTTtss). 


resulting conclusions are displayed in table 3. 
The Re and Sol are placed one comma lower in 
minor mode. However, for modulation purposes 
the major hexachord is placed one comma lower 
than the minor altogether. The translation from 
Ts to Tts is done by pattern matching 20 . The 
choice between major and minor tuning of each 
individual hexachord is done by a tonic-finding 
algorithm. 



Ut 

Re 

Mi 

Fa 

Sol 

La 

Minor 

0 

8 

17 

22 

30 

39 

- 

t 

T 

s 

t 

T 

Major 

-1 

8 

16 

21 

30 

38 

- 

T 

t 

s 

T 

t 


Table 3: The tuning of the hexachord, displayed 
in commas. 


5 Implementation 

All the previously mentioned functionality will 
be bundled into one file, a collection of C func¬ 
tions 21 . These functions will be explained in a 
reference manual at http://muditulib.eu. 

In essence this library is relatively small and 
simple. The challenging part is probably the im¬ 
plementation, depending on the environment. 
An implementation for Pure Data is ready 
yet and consists of C files written against the 
Pd-API to create a collection of separate Pd 
classes, along with the muditulib core func¬ 
tions, a Makefile based on the template by H.-C. 
Steiner, helphles and examples. An example of 
the Pd-implementation is shown in figure 1. 

6 Concluding remarks 

The tuning approaches described here highly 
depend on implementation possibilities. For 
a low-level music production environment like 
Pure Data there is actually no problem, al¬ 
though this requires quite some background 
knowledge from the user, both of music and tun¬ 
ing theory. Most popular electronic music pro¬ 
duction platforms, however, are mainly based 
on the AUDI note system. Tuning workarounds 
making use of ‘pitch bend’ are familiar to the 
present author, though not a satisfying solu¬ 
tion. Plans are to develop a file format other 
than MIDI. More about such can be expected 
in the near future. Any suggestions about file 

20 The pattern of the diatonic hexachord is T-T-s-T-T. 

21 http://sourceforge.net/projects/muditulib/ 
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formats or implementations and especially ques¬ 
tions arising from implementation ambitions, 
are very welcome. 
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Abstract 

The deployment of distributed audio systems in the 
context of computermusic and audio installation is 
explored in the paper, expanding the vision of static 
streaming audio networks to flexible dynamic au¬ 
dio networks. Audiodata is send on demand only. 
Sharing sources and sinks allows us arbitrary audio 
networks. 

This lead to the idea of message based audio sys¬ 
tems, which has been investigated within two use 
cases: Playing on an Ambisonics spatial audio sys¬ 
tem, and within a computermusic ensemble. 

In a first implementation Open Sound Control 
(OSC) is used as the content format proposing a 
definition of Audio over OSC (A 00). 

Keywords 

audio-interfaces, networked audio, OSC, computer- 
music ensembles, sound installation 

1 Introduction 

The first idea of a message based audio sys¬ 
tem came up with the requirement of playing 
a multi-speaker environment of distributed net¬ 
worked embedded devices from several comput¬ 
ers, avoiding a central mixing desk. 

Another demand for a message based au¬ 
dio network came up during the develop¬ 
ment of a flexible audio network within the 
/CE-ensemble 1 [IEM, 2011]. A variable num¬ 
ber of computermusic musicians sending time 
bounded audio material with their computers to 
other participants (for monitoring or collecting 
audio material), would have caused a complex 
audio-matrix setup of quasi-permanent network 
connections with all the negotiations and ini¬ 
tializations for these streams. Not only because 
of the limited rehearsal time, this seems to be 
both too error prone and an overkill in terms of 
network load. 

The structure of a functional audio-network 
for ICE, especially during improvising sessions, 

1 IEM (Institute of Electronic Music and Acoustics) 
Computermusic Ensemble 


cannot always be foreseen and is therefore hard 
to implement as a static network. It is therefore 
important to be able to easily change the audio 
network during performance, as musicians come 
and leave (and reboot). On the other hand, the 
need for low latency, responsiveness and suf¬ 
ficient audio quality has to be respected even 
during the dynamic change of network connec¬ 
tions. No strict requirements on sample-rates, 
sample-accurate synchronization and the use of 
unique audio formats should be made in such 
situations. It should be possible to freely add 
or remove audio related devices to/from the sys¬ 
tem without having to go through complicated 
setup of audio streams and without having to 
negotiate meta data between the participants. 
This should simplify the implementation of the 
particular nodes. 

Of course, special care has to be taken when 
playing together in an ensemble. Factors like 
network overload, especially peaks, can lead to 
bad sound and feedbacks. On the other hand, 
we also find such situations when playing to¬ 
gether in the analog world. In any case, the 
limits have to be explored during rehearsals. 

Setting up continuous streams where audio 
data, including silence, is send continuously to 
all possible destinations is an overhead, that 
can easily touch the limits of available network 
bandwidth. But also can cause wasteful/costly 
implementations. If we can send audio from 
different sources to sinks (like speaker systems) 
only on demand, simplifies the setup. Also, re¬ 
ducing the needs for negotiation for establishing 
connections simplifies this task, and therefore 
stabilizes the setup. 

The use of messages for the delivery of 
audio-signals in a network seems to contradict 
the usual implementation of real-time audio¬ 
processing implementations in digital audio 
workstations, where mostly continuous synchro¬ 
nized audio streams are used. If these audio 
messages are send repeatedly in such a way that 
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they can be combined together in time, they can 
been seen as limited audio data streams and su¬ 
persede continuous audio streams. 



Figure 1: first idea of a message audio system 
with sources S n and drains D n 

Summing up these demands, the overall vi¬ 
sion is to implement a distributed audio net¬ 
work, where a variable amount of nodes act 
as sound sources and sound sinks (drains). It 
should be possible to send audio messages from 
any source to any sink, from multiple sources 
simultaneously to a single sink, respectively 
broadcasting audio messages from one source 
to multiple sinks. Accordingly, the cross-linking 
between the audio components is arbitrary, as 
shown in figure 1. 

There should not be a “Before you stream 
audio, you first have to negotiate and connect 
with ...”, Instead, any participant should be 
able to just send their audio data to others when 
needed. The receivers should be able to decide 
how to handle the audio, depending if they can 
or want to use them. 

Following features can be outlined: 

• audio signal intercommunication between 
distributed audio systems 

• arbitrary ad hoc connections 

• various audio formats, sample-rates 

• synchronization and lowest latency possible 

• audio-data on demand only 

The most common way of communication 
within local networks is Ethernet. Therefore 
“Audio over Ethernet “ has become a widely 
used technique. However, there is roughly only 
a single approach: Stream based audio trans¬ 
mission, representing the data as a continuous 
sequence. For audio messages as on-demand 


packet based streams 2 we found no usable im¬ 
plementation (2009). This lead to the design 
and implementation of a new audio transmis¬ 
sion protocol for the demands shown before. 
As a first approach, an implementation in user 
space (on the application layer) without the 
need of special OS-drivers was intended. This 
can also be seen as the idea of “dynamic audio 
networks”. 

2 Audio over OSC 

Looking for a modern, commonly used trans¬ 
mission format for messaging systems within 
the computermusic domain, we found “Open 
Sound Control” (OSC) [Wright, 2002], With its 
flexible address pattern in URL-style and its im¬ 
plementation of high resolution time tags, OSC 
provides everything needed as a communication 
format[Schmeder et ah, 2010]. OSC specifica¬ 
tions points out that it does not require specific 
underlying transport protocol, but often uses 
Ethernet network. In our case this would be 
UDP in a first implementation but is not lim¬ 
ited to these. TCP/IP as transport protocol 
can also be used, but would make some features 
obsolete and some more complicated, like the 
requirement for negotiations to initialize con¬ 
nections. Wolfgang Jager implemented “Audio 
over OSC” ( AoO ) within a first project at the 
IEM [Jaeger and Ritsch, 2009]. This was used 
in tests and ” AUON“ (all under one net), a con¬ 
cert installation for network art 3 

2.1 the AoO-protocol 

The definition of AoO protocol was made with 
simplicity in mind, targeting also small devices 
like microcontrollers: 

AoO message := "#bundle" timestamp 
<format> <channel> [<channel>,...] 

format := "/A00/drain/<d>/format" 

samplerate blocksize overlap mime-type 
[time correction] 

channel := "/A00/drain/<d>/channel/<c>" 
id sequence resolution resampling <data> 

d ... number of drain (integer) 
c ... channel number (integer) 
data ... audio data (blob) 

2 not to be mistaken with "streaming on demand” or 
UDP packets 

3 performed 17.1.2010 in Medienkunstlabor Kun- 
sthaus Graz see http://medienkunstlabor.at/ 
projects/blender/ArtsBirthdayl7012010/index, 
html 
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A AoO message is represented by an OSC- 
bundle with the obligate timestamp. It contains 
one format message at the beginning and one or 
more channel messages. 

For the addressing scheme the structure of the 
resources in network is used as the base. Each 
device in the network with an unique network- 
address (IP-number and Port number) can have 
one or more drains. Each of these drains can 
have one or more channels. There can be an ar¬ 
bitrary amount of drains, and each drain could 
have an arbitrary amount of channels. An ex¬ 
ample address of a channel in an device looks 
like /AOO / drain/2/channel /3, where the third 
channel of the second drain in the device is tar¬ 
geted. / AOO is the protocol specific prefix. 

Like described in ’’Best Practices for Open 
Sound Control“[Schmeder et ah, 2010], REST 
(Representational State Transfer) style is used. 
With its stateless representation each message 
is a singleton containing all information needed. 

In OSC, there is a type of query opera¬ 
tors called address pattern matching. These 
can be used to address multiple channels or 
drains in one message. Since pattern match¬ 
ing can be computational intensive, we pro¬ 
pose only to use the wild-char for address¬ 
ing all channels of a drain or all drains of 
a device. For instance the channel message 
/ AOO/drain/ 2 /channel /* will use the audio 
data for all channels of the second drain. 

A OSC format message, with for example 
/AOO/drain /2/ format as address string, is al¬ 
ways the first item in the bundle and specifies 
the samplerate, the blocksize and overlap factor 
as integer, followed by a string as the mime-type 
of the audio data. The optional time correction 
factor will be explained at section 2.3. 

Integer was chosen in favor for processors 
without hardware floating point support. Chan¬ 
nel specific data information like the id number 
of the message stream, the sequence number in 
the channel message allow more easily to detect 
lost packages. The resolution of a sample and an 
individual resampling factor is contained in the 
channel messages, where the resampling factor 
enables channels to differ from the samplerate 
specified in the format message, allowing lower 
rates for sub channels, control streams or higher 
rates for specific other needs. This also becomes 
handy if FFT-frames are transmitted as data. 

Some of the header data is shown in the fol¬ 
lowing summary example to explain some spe¬ 
cific features of the audio transmission: 


sampling rate Different sampling rates of 
sources are possible, which will be re¬ 
sampled in the drain. 

blocksize The amount of samples in each pack¬ 
age of audio data, which must be greater or 
equal 1, limited by packet size. 

overlapping factor The overlapping factor is 
1 (one) by default. Higher values can 
be used to implement redundancy, to deal 
with lost packets or needed when sending 
FFT-frames (in future implementations). 

resampling factor is linked to the sampling- 
rate in order to be able to choose the pre¬ 
cision of each channel individually using 
oversampling or similar. 

coding of the audio data using the Multi¬ 
purpose Internet Mail Extensions (MIME) 
standard[Authority, 2009]. In our uncom¬ 
pressed format, the MIME type would 
be ”audio/pcm“, whereas ”audio/CELP“ 
classifies CELP encoded data. 

In order to send usable data, sources have 
to be aware of the formats a given drain 
can handle. 4 

data types preferred are uncompressed pack¬ 
ets with data types defined by OSC, like 
32-Bit float. However, it’s also possible to 
use blobs with an arbitrary bit-length audio 
data. This can become handy if bandwidth 
matters. Sources must be aware, which for¬ 
mats can be handled by the drains. 

To provide low latency, time-bounded audio 
transmissions should be sliced into shorter mes¬ 
sages and send individually to be reconstructed 
at the receiver. 

2.2 drains 



Figure 2: audio messages are arranged as single, 
combined or overlapped 

Sending audio data is simply slicing and 
adding timestamps. On the other side, receiving 

4 This subject is currently under discussion, and may 
get changed in the future 
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means that audio packets have to be resched¬ 
uled and synchronized on the time-line, using 
the timestamp, sequence and id received, and 
mixed together. Mixing is required either if au¬ 
dio packets come from different sources, have 
different ids or if they are overlapping (using 
an overlapping factor greater than one). Au¬ 
dio messages also have to be re-sampled before 
they are added, to handle with sources with 
different samplerates. Even if nodes are using 
the same nominal sample-rate, they are usu¬ 
ally not sample-synchronized, since the sample- 
clocks can drift in time. The re-sampling factor 
can therefore change dynamically. 

For re-arranging the audio packages there is 
a need to do some sort of labeling of the mes¬ 
sages, since it is not clear if they are intended 
to overlap or are different material. This can 
be handled via the “identification number” (id). 
Identical identification numbers means to recog¬ 
nize the material as one material and they can 
be cross-faded. So these numbers has to has to 
unique at least at the drain. 

The first audio packet has to be faded in and 
the last faded out. A sequence of audio mes¬ 
sages must be concatenated. At least one mes¬ 
sage has to be buffered to know if a next one ar¬ 
rives. If messages are in overlapping mode, they 
always have to be cross-faded. If one packet is 
lost in the overlapping mode, the signal can be 
reconstructed. 

2.2.1 addressing problems 

Like described above, to deliver audio messages 
to a drain, additionally to the drain number and 
channel number, the address of the device has 
to be known. A decision was made, that the 
address is not part of the message, since the 
sender has to know about the drain on the re¬ 
ceiver and the network system has to handle the 
addressing. Since automatic IP assignment can 
be used, this could make the argument to sim¬ 
plify the network obsolete, since we devices have 
no static address. 

Like stated in in the vision, we do want ne¬ 
gotiations and requests, but in situations where 
IPs are unknown, we needed a mechanism to 
grasp it. One implementation was announce¬ 
ment message broadcasted by each drain, with 
the address and a human readable meaning¬ 
ful name. Even more polite we implemented 
them as invitation messages, which also states: 
’’ready to receive“. This was done every 10 sec¬ 
onds, to limit load and also serves as a live mes¬ 
sage. 


A second problem arose, since broadcasting 
to all drains with the same number, the desti¬ 
nation information is not contained in the audio 
message, we cannot use broadcast to reduce net¬ 
work load and address specific destinations. For 
this the drain has to know about the sources 
it will accept. Anyway this worked fine, but 
made some additional efforts in communication 
before. 

Anyway addressing is in heavy discussion, has 
to be tested further on use cases and will prob¬ 
ably change in future. 

2.2.2 mixing modes 

In this first implementation we used two dif¬ 
ferent modes: Mode 1 provides the possibility 
of summation of the received audio signals and 
Mode 2 should perform an arithmetic averaging 
of parallel signals. The reason for this is that 
summing audio signals with maximum ampli¬ 
tudes each causes distortion. Using Mode 2 this 
cannot happen. If many sources play into one 
drain, this can also be seen as a kind of mix to 
reduce the impact of a single one. Sometimes 
automatic level control or limiting in the digi¬ 
tal domain after adding the signals is the better 
way to prevent clipping. 


realtime _ 



Figure 3: re-sampling rate R n between source 
S and drain D is not constant 

2.3 timing and sample-rates 

Timing is critical in audio-systems, not only for 
synchronizing audio, but also to prevent jitter 
noise. Time tags of the packets are represented 
by a 64 bit fixed point number, as specified 
by OSC, to a precision of about 230 picosec¬ 
onds. This conforms to the representation used 
by the Network Time Protocol (NTP) [Mills et 
al., 2010], 

In fixed buffering mode, the buffer size has 
to be chosen large enough to prevent dropouts. 
In the automatic buffer control mode, the drain 
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should use the shortest possible size for buffer¬ 
ing. If packets arrive too late, buffering should 
be dynamically extended and then slowly re¬ 
duced. 

Since audio packets can arrive with differ¬ 
ent sample-rates, re-sampling is executed before 
the audio data is added to the internal sound 
stream synchronized with the local audio en¬ 
vironment. This provides the opportunity to 
synchronize audio content respecting the timing 
differences and time drifts between sources and 
drains. This strategy of resampling is shown in 
figure 3: 

Looking at synchronization in digital audio 
system, mostly a common master-clock is used 
for all devices. Since each device has its own 
audio environment, which may not support ex¬ 
ternal synchronization sources, the time Tgn of 
the local audio environment is used to calculate 
the timestamp for outgoing audio messages. 

Using the incoming timestamps from the re¬ 
mote source, we can compare them with the lo¬ 
cal time t^n and correct the re-sampling factor 
R n dynamically for each message. The change 
of the correction should be small if averaged 
over a longer time, but can be bad for first audio 
messages received. 

For a better synchronization of audio data, a 
Time Transfer protocol can be used in parallel 
to synchronize the drain with the source, as a 
sort of master-clock. 

Therefore, as proposed in the OSC specifica¬ 
tions, NTP can be used for each node. Another 
time protocol for synchronization of audio data 
is the Precision Time Protocol (PTP)[on Sen¬ 
sor Technology, 1588-2002], e.g. also used in 
AVB 5 , allows a more lightweight implementa¬ 
tion in local networks and can guarantee a quasi 
sample-accurate synchronization. PTP is the 
preferred time protocol to be used with AoO. 
For these protocols we need a master (or grand¬ 
master) in the network. This is done differently 
depending on the used implementation of the 
time protocol. 

Since the local time source of a device can dif¬ 
fer from the timing of the audio environment, 
each device needs a correction factor between 
this time source and the audio hardware time in¬ 
cluding the time master device. This factor has 
to be communicated between the devices, so the 
re-sampling correction factor can be calculated 
before the first audio message is sent, guarantee¬ 
ing a quasi sample-synchronous network-wide 

5 Audio Video Bidding, Standard IEEE 802.1 


system starting with the first message send. 

2.4 Requests 

Asking won’t hurt. If the drain provides in¬ 
formation about its capabilities, it can be used 
to optimize and ensure the transmission. How¬ 
ever, this information is optional, allowing sim¬ 
ple implementations on some nodes, like micro¬ 
controllers, that may be unable to accomplish 
this task. Until now there is no proposal how 
to implement such requests, instead we used 
announcement/invitation messages for grasping 
the sources in the local net. 

2.5 Implementation 

As a first proof of concept, AoO was im¬ 
plemented within user space using Pure 
Data[Puckette, 1996]. This implementation has 
shown various problems to be solved in future. 
Using the network library ienmet 6 additional 
” externals “ have been written in C to extend 
the OSC-Protocol, split continuous audio sig¬ 
nals into packets and mix OSC audio messages 
in drains. As repository for the GPL open 
source can be found at the ”Opensource@IEM” 
sourceforge as git repository site at: 

http://sourceforge.net/p/iem/aoo/ 

As a first test environment, a number of dif¬ 
ferent open-source audio hardware implemen¬ 
tations, using Debian Linux OS-System, has 
been used. The test patches were written 
with Pd version 0.42.5, where a central com¬ 
ponent has been the OSC library of Martin 
Peach. Later, an implementation for a micro¬ 
controller board ”escher2“[Algorythmics, 2012] 
has been created, which has been superseded 
by small embedded arm-devices, like beagle- 
bones [Foundation, 2013], also using a Debian 
OS system. 

3 message based Ambisonics spatial 
audio systems 

As a first goal, the geodesic sound-dome in 
Pischelsdorf (with a diameter of 20 m and a 
height of about 10 m) as an environmental land¬ 
scape sculpture in Pischelsdorf should trans¬ 
mute into 3D a sound-sphere. Therefore as 
special hardware and software, a low power so¬ 
lar power driven multichannel Ambisonics sys¬ 
tem was developed and installed prototypically. 
This should result in a low cost implementation 
of multichannel audio system Up to 48 speakers 

6 iemnet project site is http://puredata.info/ 
downloads/iemnet 
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Figure 4: AoO with embedded devices for spa¬ 
tial audio system 


should be mounted in a hemisphere, forming an 
Ambisonics sound system. Using 6 nodes, each 
with 8 speakers, special embedded controllers 
are used to render the audio in the system (fig¬ 
ure 4). 



The first implementation of the nodes has 
been done with special micro-controller boards 
esc/ier\2[Algorythmics, 2012] which drive the 
custom designed DA-Amp boards. Since these 
devices have very limited memory (max. 16 
samples of 64 channels), standard Linux audio 
system cannot provide the packets small and 
fast enough for a stable performance without 
special efforts, like own driver in kernel space for 
the packet delivery. Therefore a major problem 
has been the synchronization and the reliability 
of the transmission, but providing latency. 

One other solution could be, to secure re¬ 
sources like bandwidth and computing time 
with restricting audio data to be sent on defined 
time slots: only and one time-slot for each de¬ 
vice. Most of the available network-components 
are able to handle the IP-protocol or even OSC 
but unfortunately there is no commonly used 
computer OS, which is able to deliver audio data 
in dedicated time-slots. Therefore as one imple¬ 
mentation of hard real-time networking for real¬ 
time Linux, the RTnet[Team, 2002] has been 
found. It needs a hard-realtime kernel. In a fur¬ 
ther thought the OSC-Transmission has to be 
implemented as a Linux-device, coupling with 
the RT-Net Ethernet driver. 

Since 2012 small embedded Linux-systems 
like the beaglebone black [Foundation, 2013] are 
available and can be used to drive the DAs with 
amplifiers. This has been tested recently with 
good success on a beaglebone black: An accept¬ 
able latency of 5-10 ms with 8 out-channels has 
been achieved . 


Figure 5: One node with one speaker in the 
dome 

Each node is a small embedded computer 
equipped with an 8-channel sound-card, includ¬ 
ing amplifiers and speakers. Each speaker can 
been calibrated and fed individually. However, 
since each unit is aware of its speaker positions, 
it can also render the audio with an internal 
Ambisonics encoder/decoder combination. 

So instead of sending 48 channels of audio to 
spatialize one or more sources, the sources can 
be broadcast combined with OSC-spatialization 
data and the drains render them independently. 
Another possibility is to broadcast an encoded 
Ambisonics-encoded multichannel signal, where 
the devices decode the Ambisonics signal for 
their subset of speakers. The Sound Environ¬ 
ment can be sent from one master controller or 
any other connected computer. 



Figure 6: sounddome as hemisphere, 20 m di¬ 
ameter in cornfield 

The main advantage, besides the low cost and 
autonomous system, is that one or more sound 
technicians or computer musicians can enter the 
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dome, plug into the network with their portable 
devices and play the sound dome either address¬ 
ing speakers individually, with audio material 
spatializing live with additional OSC messages 
or a generated or prerecorded Ambisonics audio 
material. 

3.1 Playing together 



Figure 7: first concert of IEM computermusic 
ensemble ICE playing over a HUB 

When specifying an audio-network for playing 
together within an ensemble, a focus was set on 
the collaborating efforts to be done to gain the 
unity of the individuals. 

So, like a musicians with acoustic instrument, 
joining a band with Linux audio-computer im¬ 
plies a need for a place where the musician has 
a ’’virtual sound space 11 they can join. So they 
provide sound sources and need to plugin audio 
channels on a virtual mixing desk. With AoO 
the participant just needs to connect to the net¬ 
work, wireless or wired, choosing the drains to 
play to and send phrases of audio with AoO 
when needed. 

For the ICE ensemble Ambisonics as an vir¬ 
tual audio environment was chosen, which can 
be rendered to different concert halls. Within 
the Ambisonics each musician can always use 
the same playing parameters for spatializing her 
or his musical contribution. So the imagination 
of the musician is ” playing in a virtual 3D envi¬ 
ronment 11 , sending their audio signals together 
with 3D-spatial data to a distributed mixing 
system which is rendering it on the speakers. 

Additional there is an audio communication 
between the musicians, where each musicians 
can hear into the signal produced by the other, 
if there is one or on special offered drains send 
audio intervention to the others for e.g. mon¬ 
itoring purposes. The musicians can do their 


own monitor mix, depending on the piece and 
space where the play. 

Using a message audio system, each musi¬ 
cians only sends sound data if playing, like audio 
bursts just notes, or just sending their audio¬ 
data to another musicians, who will process this 
further and so on. There should be no border 
on the imagination of these situations, (as long 
it can be grasped by the participants). 



Ambisonics System 3D/2D + SUB 


Figure 8: ICE using AoO as space for playing 
together and on a PA systemb 

4 state of the work 

The AoO has been implemented for proof of 
concept and special applications in a first draft 
version. The next version should fixate the pro¬ 
tocol, after having discussed it in public, in a 
way that makes it compatible with future pro¬ 
tocol upgrades. 

The usage of AoO in an ensemble has been ex¬ 
plored in a workshop with students at the IEM, 
but the implemented software was not stable 
enough on the different platforms used for stage 
performance. This was especially true, when we 
tried to reach the short latencies needed for con¬ 
certs. Some more programming efforts has to 
be done, to guarantee better timing using dif¬ 
ferent computer types, within different Linux- 
implementations and setups. 

Running AoO on embedded Linux devices 
has shown to be successful, if the devices are 
tweaked for real-time audio usage. The de¬ 
velopment on the escher2 (dsPIC33E-)micro- 
controller board has been abandoned in favor of 
the new generation of small low power embed¬ 
ded devices with arm processors. A first ver¬ 
sion of implementation (V1.0) of AoO is sched¬ 
uled for April 2014 for a public installation in 
the sound-dome, where the Ambisonics audio¬ 
system should be finalized for permanent perfor- 
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mance and open access. More documentation 
and source code should be released and open- 
hardware as To O-audio devices should be avail¬ 
able. 

Special focus is done on using embedded de¬ 
vices with AoO as networked multichannel au¬ 
dio hardware interfaces for low cost solutions 
adding audio processing for calibration filters, 
beam-forming,... for speaker-systems optional 
powered over Ethernet. 

5 Conclusions 

Starting as a vision, these experiments and im¬ 
plementations have shown, that message based 
audio systems can enhance the collaboration in 
ensembles, playing open audio systems. Also 
network art projects using the Internet can use 
AoO to contribute to sound installation from 
outside, just knowing the IP and ports to use. 

The implementation is far from being com¬ 
plete, and more restrictions will be included in 
order to simplify the system. Synchronization 
and re-sampling is not perfect, but usable for 
most cases and it has been shown, that audio 
message systems can work reliable in different 
situations. 

Audio message systems can also be imple¬ 
mented in other formats than OSC and lower 
layers of the Linux OS, like jack-plugins or 
ALSA-modules as converters between message 
based audio system and synchronous data flow 
models. 

For really low latency (below 1 ms) using AoO 
as audio over Ethernet system, kernel-drivers 
must be developed and with time-slotted Ether¬ 
net transmissions, systems with latencies down 
to 8 us on transmission time can be imple¬ 
mented using hard RT-systems. 
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Abstract 

Today’s public Internet availability and capabilities 
allow manifold applications in the field of multime¬ 
dia that were not possible a few years ago. One 
emerging application is the so-called Networked Mu¬ 
sic Performance, standing for the online, low-latency 
interaction of musicians. This work proposes a 
stand-alone device for that specific purpose and is 
based on a Raspberry Pi running a Linux-based op¬ 
erating system. 

Keywords 

Networked Music Performance, Audio over IP, 
ALSA, Raspberry Pi 

1 Introduction 

The ways of today’s online communication are 
versatile and rapidly evolving. The trend went 
from text-based communication, over audio- 
based communication, and finally constituted in 
multimedia-based communication. One arising 
branch of online communication is the so-called 
Networked Music Performance (NMP), a spe¬ 
cial application of Audio over IP (AoIP). It al¬ 
lows musicians to interact with each other in 
a virtual acoustic space by connecting their in¬ 
struments to their computers and a software- 
based link-up. This procedure allows artistic 
collaborations over long distances without the 
need of traveling and hence, can enrich the life 
of artists. Instead of increasing the content di¬ 
mensionality and therefore the data rate, the 
challenge in AoIP is to fulfill a certain delay 
threshold that still allows musical interaction. 

For the purpose of providing an easy-to-use 
system realization, an all-in-one device, entitled 
the JamBerry , is presented in this work. The 
proposed system, as shown in Fig. 1, consists 
of the well-known Raspberry Pi [1] and several 
custom hardware extensions. These are neces¬ 
sary since the Raspberry Pi does not provide 
high-quality audio output and no audio input at 
all. Furthermore, the proposed device includes 
several hardware components allowing a quick 


and simple connection of typical audio hard¬ 
ware and instruments. The Raspberry Pi itself 
can be described as chip-card-sized single-board 
computer. It was initiated for educational pur¬ 
poses and is now widely-used, especially in the 
hardware hobbyist community since it provides 
various interfaces for all sorts of extensions. 



Figure 1: The JamBerry Device 


The paper is structured as following. An in¬ 
troduction into the topic of Audio over IP is 
given in Section 2, including the requirements 
and major challenges when building such a sys¬ 
tem. Section 3 gives a detailed view on the ac¬ 
tual AoIP software running on the JamBerry. 
The necessary extensions of the Linux audio 
drivers and the integration in the ALSA frame¬ 
work is depicted in Section 4. The custom hard¬ 
ware extensions to the Raspberry Pi are ex¬ 
plained in Section 5. Section 6 highlights the ca¬ 
pabilities of the JamBerry in the contexts of au¬ 
dio and network parameters, whereas conclud¬ 
ing thoughts can be found in Section 7. 

2 Audio over IP 

Transmission of Audio over IP-based networks 
is nowadays a wide-spread technology with two 
main applications: Broadcasting audio streams 
and telephony applications. While the first one 
provides no return channel, the second one al¬ 
lows for direct interaction over large distances. 
Although, the requirements in terms of audio 
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quality and latency for playing live music to¬ 
gether are not fulfilled by current telephony sys¬ 
tems. 

The massive spreading of broad-band Inter¬ 
net connections and increase in network reliabil¬ 
ity allows the realization of AoIP systems now. 
Therefore, this topic gained much research at¬ 
tention in the last years. A good introduction 
into the topic of Networked Music Performances 
and the associated problems can be found in [2], 
while [3] gives an extensive overview of existing 
systems. 

An early approach was SoundWIRE [4] by 
the Center for Computer Research in Music and 
Acoustics (CCRMA), where later JackTrip [5] 
was developed. JackTrip includes several meth¬ 
ods for counteracting packet loss such as over¬ 
lapping of packets and looping of data in case 
of a lost packet. It is based on the JACK sound 
system, just like NetJack [6] that is now part of 
JACK itself. To avoid the restriction to JACK, 
Soundjack [7] is based on a generic audio core 
and hence, allows cross-platform musical online 
interaction. 

The Distributed Musical Rehersal Environ¬ 
ment [8] focuses on preparing groups of musi¬ 
cians for a final performance without the need 
to be at the same place. Remote rehersal is 
also one of the applications of the DIAMOUSES 
framework [9] that has a very versatile platform 
including a portal for arranging jam sessions, 
MIDI support and DVB support for audience 
involvement. 

2.1 Requirements 

The goal of this project was to build a com¬ 
plete distributed music performance system to 
show the current state of research and estab¬ 
lish a platform for further research. The sys¬ 
tem is supposed to be usable in realistic envi¬ 
ronments such as rehearsal rooms. Therefore, 
it should be a compact system that integrates 
all important features for easy to setup jam¬ 
ming sessions. This includes two input chan¬ 
nels with various input capabilities to support 
high-amplitude sound sources such as keyboards 
or preamplified instruments, as well as low- 
amplitude sound sources like microphones and 
passive guitar pickups. Furthermore, it should 
drive headphones and provide line-level output 
signals. 

The system should support sampling rates 
of 48 kHz with a bit depth of 16 bit. Higher 
values do not provide much benefit in quality. 


Furthermore, no further signal processing steps, 
depending on highly-detailed signaled represen¬ 
tations, are involved. To allow the interaction 
with several musicians but still stick to the com¬ 
putational constraints of the Raspberry Pi, the 
system shall support up to four interconnected 
JamBerries. 

2.2 Challenges 

Transmission of audio via the Internet is 
considerably different from point-to-point dig¬ 
ital audio transmission techniques such as 
AES/EBU [10] and even Audio over Ether¬ 
net (AoE) techniques like Dante or EtherSound 
[11]. The transmission of data packets via the 
Internet is neither reliable nor properly pre¬ 
dictable. This leads to audio data being consid¬ 
erably delayed or even vanished in the network. 

This is commonly counteracted by using large 
data buffers where the packets arriving in ir¬ 
regular time intervals are additionally delayed 
so that late packets can catch up. Unfortu¬ 
nately, large buffers are contradictory to the re¬ 
quirements of distributed music performances 
since a minimum latency is essential. Inter¬ 
action of several musicians is solely achievable 
when the round trip delay does not exceed a 
certain threshold [12; 13]. Secondly, even large 
buffers do not prevent dropouts resulting from 
lost packets. Therefore, this project takes two 
completive approaches: 

• Audio data packets that do not arrive in 
time are substituted by a technique called 
error concealment. Instead of playing back 
silence, audio is calculated from preceding 
data. 

• The data buffer length is dynamically ad¬ 
justed to the network conditions. This en¬ 
ables minimum latency while still providing 
good audio quality. 

3 Software System 

The AoIP software itself is a multi-threaded 
C+-1-11 application running in user space. It 
accesses the audio hardware via the well-known 
ALSA [14] library. The user interaction takes 
place via a WebSocket interface that enables the 
use of a JavaScript/HTML GUI that can be ac¬ 
cessed via the integrated touchscreen as well as 
from a remote PC or tablet. The WebSocket 
interface is provided by a library [15] written 
during this project running the WAMP [16] pro¬ 
tocol. 
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Figure 2: Data flow of the JamBerry software 


latency coding procedure. The encoding is done 
by the EncoderStream that passes the data to 
the sender for sending it to all connected peers 
via unicast UDP. Currently, there is no discov¬ 
ery protocol implemented, so the peers have to 
be entered manually. As soon as the data is re¬ 
ceived at the receiver, it is decoded and pushed 
into the receiver buffer queue. The Playback- 
Controller mixes the data from various sources 
and enables ALSA to access the result. Thus, 
a continuous reading of data is realized. In the 
case of missing data an error concealment pro¬ 
cedure is triggered to replace the missing data 
and avoid gaps in the playback. The current 
implementation utilizes the concealment proce¬ 
dure from the Opus codec, since its complexity 
is low in contrast to other known concealment 
strategies [19; 20; 21]. Alternatively, the last 
block can be repeated until newer data arrives 
(so-called ’’wavetable mode” as in [5]). The 
queuing process at the receiver is explained in 
more detail in the following. 

3.1 Adaptive Queuing 

In order to achieve minimum latency while 
maintaining good audio quality, the length of 
the audio buffer is adjusted to the network con¬ 
ditions within the playback thread. The corre¬ 
sponding control routine is depicted in Fig. 3. 



The data flow of the audio through the soft¬ 
ware is depicted in Fig. 2. Audio is captured 
from the hardware via the ALSA library. As 
soon as a block (120 or 240 samples) of new data 
is available, it is taken by the CaptureController 
that mixes the signal down to a single channel. 
Transmitting multiple streams is possible, too, 
but provides a negligible benefit in this scenario. 
The data can be transmitted as raw data. Al¬ 
ternatively, the required data rate can be re¬ 
duced by utilization of the Opus [17; 18] low- 


Figure 3: Process of Playback Thread 

The ALSA data queue is kept very short to 
avoid unnecessary delays that would increase 
the overall latency. The PlaybackController 
monitors the state of ALSA and just before the 
hardware will request new data, it is written to 
the ALSA buffer. Whenever current audio data 
exists in the moment of the hardware request, 
this data is utilized. In the case of missing data, 
the error concealment routine is triggered to 
produce the corresponding data. The computa- 
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tion of concealment data takes some time. This 
period of time is taken into account to provide 
the data just at the right point in time. 

In order to maintain a reasonable buffer size, 
a simple open-loop control was implemented. A 
buffer size that is unreasonably large would re¬ 
sult in useless delay. When the buffer is too 
small, a major part of the audio packets arrives 
too late. Although a certain amount of packets 
can be concealed, the audio quality decreases 
with a rising amount of lost packets. 

Right after a new connection was established, 
the buffer size is set to a very high value. In the 
following few seconds, the length of the queue in 
samples Q is measured and the standard devi¬ 
ation oq is calculated. After the measurement 
phase is over, the optimal queue length is cal¬ 
culated as 

Qopt = P ■ &Q , (1) 

where the constant j3 > 1 accounts for pack¬ 
ets outside the range of the standard deviation. 
When the current queue length is outside the 
interval 


is driven by a pulse-width modulation (PWM) 
interface providing medium quality audio. For¬ 
tunately, there is another possibility for audio 
transmission: The Broadcom SoC on the Rasp¬ 
berry Pi provides a digital I 2 S interface [22] that 
can be accessed by pin headers. Together with 
an external audio codec as explained in the next 
section, this enables high quality audio input 
and output. However, the Linux kernel lacked 
support for the I 2 S peripheral of the Rasp¬ 
berry Pi. An integral part of this project was 
therefore to write an appropriate kernel driver. 

Since this driver should be as generic as pos¬ 
sible, it is implemented as a part of the ALSA 
System on Chip (ASoC) framework. It is a sub¬ 
system of ALSA tailored to the needs of embed¬ 
ded systems that provides some helpful abstrac¬ 
tions that makes it easy to adapt the driver for 
use with other external hardware. Actually, to¬ 
day there is quite a large number of both open 
and commercial hardware that uses the driver 
developed during this project. 


[Qopt. Qtoh Qopt T Qtoi]> (2) 

the corresponding number of samples is dropped 
or generated. Once the queue is adjusted to the 
current network characteristic, this occurs very 
infrequently so the audible effect is insignificant. 
The parameters (3 and Qtoi are used to trade-off 
the amount of lost packets, and therefore the 
audio quality, against the latency. 

4 Linux Kernel Driver 

The Raspberry Pi has neither audio input nor 
proper audio output. The existing audio output 



Figure 4: Structure of ASoC and the embed¬ 
ment into the Linux audio framework 


Fig. 4 depicts the general structure of ASoC 
as used for this project. When an application 
starts the playback of audio, it calls the cor¬ 
responding function of the ALSA library. This 
again calls the appropriate initializers for the in¬ 
volved peripheral drivers that are listed in the 
machine driver. In particular this is the codec 
driver that is responsible for control commands 
via I 2 C, the I 2 S driver for controlling the digital 
audio interface, and the platform driver for com¬ 
manding the DMA engine driver. DMA (Direct 
Memory Access) is responsible for transmitting 
audio data from the main memory to the I 2 S 
peripheral and back. The I 2 S peripheral for¬ 
wards this data via the I 2 S interface to the audio 
codec. For starting the playback of the codec, 
the codec driver will send an appropriate com¬ 
mand by using the I 2 C subsystem. The codec 
driver is used for transmitting other codec set¬ 
tings such as volume, too. 

These encapsulations and generic interfaces 
are the reason for the software structure’s flex¬ 
ibility and reusability. For using a new audio 
codec with the Raspberry Pi, only the codec 
driver and the slim machine driver have to be 
replaced. In many cases only the wiring by the 
machine driver has to be adapted since there 
are already many codec drivers available. The 
spreading of these drivers is based on the fre¬ 
quent usage of ASoC on different platforms. 
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5 Hardware 

Since the Raspberry Pi does not provide proper 
analog audio interfaces, major effort was spent 
designing audio hardware, matching the NMP 
requirements. Furthermore, a touchscreen for 
user-friendly interaction was connected that re¬ 
quires interfacing hardware. Due to these ex¬ 
tensions, the JamBerry can be used as a stand¬ 
alone device without the need of external pe¬ 
ripherals such as a monitor. 

An overview of the external hardware is de¬ 
picted in Fig. 5. The extension’s functionality 
is distributed module-wise over three printed 
circuit boards: A codec board that contains 
the audio codec for conversion between analog 
and digital domain. It is stack mounted on the 
Raspberry Pi. This board is connected to the 
amplifier board that contains several amplifiers 
and connectors. The third board controls the 
touchscreen and is connected to the Raspberry 
Pi via HDMI. In the following, the individual 
boards are explained in more detail. 
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Figure 5: Hardware Overview 

5.1 Codec Board 

The main component on the digital audio board 
is a CS4270 audio codec by Cirrus Logic. It has 
integrated AD and DA converters that provide 
sample rates of up to 192 kHz and a maximum 
of 24 bits per sample. It is connected to the I 2 S 
interface of the Raspberry Pi for transmission of 
digital audio and to the I 2 C interface for con¬ 
trol. A linear voltage regulator provides power 
for the analog part of the audio codec, while 
the digital part is directly driven by the voltage 
line of the Raspberry Pi. The audio codec is 
controlled by an external master clock genera¬ 
tor. This enables fine-grained synchronization 


of the sampling frequency on different devices 
and prevents clock drifts as shown in [23]. The 
MAX9485 clock generator provides this possi¬ 
bility by a voltage controlled oscillator that can 
be tuned by an external DAC. 

5.2 Amplifier Board 

The analog audio board is designed to provide 
the most established connection possibilities. 
On the input side two combined XLR/TRS con¬ 
nectors allow the connection of various sources 
such as microphones, guitars or keyboards. 
Since these sources provide different output lev¬ 
els that have to be amplified to line-level for 
feeding it into the audio codec, a two-stage non¬ 
inverting operational amplifier circuit is placed 
channel-wise in front of the AD conversion unit. 
It is based on OPA2134 amplifiers by Texas In¬ 
struments that have proven their usability in 
previous guitar amplifier projects. The circuit 
allows an amplification of up to 68 dB. 

On the output side a direct line-level output 
is provided as well as a MAX13331 headphone 
amplifier. It can deliver up to 135 mW into 
32 Q headphones. Furthermore, the analog au¬ 
dio board contains the main power supply for 
the JamBerry. 

5.3 Touchscreen Driving Board 

In order to provide enough display space for a 
pleasant usage experience, but still maintain a 
compact system size, a 7” screen size is used. 
A frequently used, thus reliable, and afford¬ 
able resistive touchscreen of that size is the 
AT070TN92. For using it together with the 
Raspberry Pi, a video signal converter is needed 
to translate from HDMI to the 24 bit parallel in¬ 
terface of the TFT screen. This is provided by a 
TFP401A by Texas Instruments. The touch po¬ 
sition on the screen can be determined by mea¬ 
suring the resistance over the touch panel. This 
measurement is subject to noise that induces 
jittering and results in imprecise mouse point¬ 
ers. The AD7879-1W touch controller is used 
to conduct this measurement since it provides 
integrated mean and median filters that reduce 
the jitter and is controlled via I 2 C. The same 
interface is provided by a DAC for controlling 
the backlight of the TFT. An additional cable 
connection for the I 2 C connection was avoided 
by reusing the DDC interface inside the HDMI 
cable as carrier for the touch and brightness in¬ 
formation. 
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6 Evaluation 

The system was evaluated in terms of overall 
latency introduced by the network as well as 
audio quality. 

6.1 Network 

In order to evaluate the behavior of the system 
under various and reproducible network condi¬ 
tions, a network simulator was implemented. 
Fig. 6 shows the use of a single JamBerry device 
connected to the simulator that bounces the re¬ 
ceived data back to the sender. 



Figure 6: Software Evaluation System 


For calibrating the simulator to real condi¬ 
tions a network connection of 13 hops to a 
server, located in a distance of 450 km, is used. 
Fig. 7 shows the distribution of the packet de¬ 
lay. The average delay is about 18 ms with a 
standard deviation of 4 ms. 
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Figure 7: Time series and histogram of the 
packet delay over the test route 


The overall latency is measured by generating 
short sine bursts and feeding them into the Jam- 
Berry. This signal is compared to the resulting 
output signal by means of an oscilloscope. In 
addition, GPIO pins of the Raspberry Pi are 
toggled when the sine burst is processed in dif¬ 
ferent software modules as presented in Sect. 3. 
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Figure 8: Journey of a Sine Burst 

A resulting oscillogram can be seen in Fig. 8. 
The overall latency is about 40 ms. The time 
between sending and reception is 15 ms. This 
matches the time for the actual transmission. 
Between decoding and mixing, the packet is de¬ 
layed in the buffer queue for about 16 ms. This 
buffering is needed to compensate the high jitter 
of the connection. 

For the following evaluation, the overall la¬ 
tency is measured by using the above method 
while recording the amount of lost packets. 
Fig. 9 demonstrates the influence of factor f3 
in Eq. (1) while having a constant jitter vari¬ 
ance of 9.5 ms 2 . With low /?, the optimal queue 
length is short, so the overall latency is short, 
too. Although, since there is less time for late 
packets to catch up, the amount of packet loss 
is very high. With increasing /3, the amount 
of lost packets decreases, but the latency in¬ 
creases. Since sophisticated error concealment 
algorithms can compensate up to 2% packet 
loss [19], a constant (3 = 3 is chosen for the 
next evaluation, which is illustrated in Fig. 10. 
It demonstrates how the control algorithm han¬ 
dles various network conditions. With increas¬ 
ing network jitter variance, the system is able 
to adapt itself by using a longer queue length. 
This increases the overall latency, but not the 
packet loss so the audio quality stays constant. 
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6.2 Audio 


The evaluation of the JamBerry’s audio qual¬ 
ity was performed module-wise. Therefore, the 
audio output, audio input, the headphone am¬ 
plifier and pre-amplifiers were independently in¬ 
spected. First of all, the superiority of the pro¬ 
posed audio output in contrast to the original 
PWM output shall be demonstrated. 


llPWM output DO Codec output 



Figure 11: THD of the Raspberry Pi PWM and 
the codec board output for a 1 kHz sine using 
different signal levels 

If a 1 kHz sine tone is replayed using both 
outputs accordingly and inspect the correspond¬ 
ing output spectra, as done in Fig. 12, it be¬ 
comes apparent that the quality is increased 
significantly using the new codec board. The 
PWM output introduces more distortion, visi¬ 
ble in Fig. 12 in form of larger harmonics at mul¬ 
tiples of the fundamental frequency. For exam¬ 
ple, the amplitude of the first harmonic differs 
in about 40 dB. Also the noise floor at higher 



Level 

THD 

SNR 


in dBFS 

in dB 

in dB 

Outputs ; 

PWM output 

0 

-57 

55 

Codec output 

0 

-81 

80 

Input s 

Codec input 

0 

-91 

71 


Table 1: Digital audio hardware characteristics 



Gain 

THD 

SNR 


in dB 

in dB 

in dB 

1 Amplifiers 1 

Headphone 

16 

-85 

79 

Input 

17 

-81 

66 

Input 

34 

-74 

48 


Table 2: Analog audio hardware characteristics 

frequencies is significantly lower. A difference 
of up to 10 dB can be recognized in Fig. 12. At 
50 Hz ripple voltage from the power supply can 
be seen. Using a power supply of higher quality 
can reduce this disturbance. 

The distortion and noise, audio hardware in¬ 
troduces to audio signals signal is typically ex¬ 
pressed in total harmonic distortion (THD) and 
Signal-Noise-Ratio (SNR), respectively. THD 
describes, in most conventions, the ratio of the 
energy of harmonics, produced by distortion, 
and the energy of the actual signal. In contrast, 
SNR represents the ratio between the original 
signal energy and the added noise. 

The THD’s of the two outputs are illustrated 
for several signal levels in Fig. 11. Apparently, 



Figure 9: Latency and packet loss against 
packet loss tuning factor (3 
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Figure 10: Adaption of latency and packet loss 
to various jitter conditions 
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Figure 12: Spectra of the Raspberry Pi PWM and the codec board output for an 1 kHz sine 


the THD of the new codec board is at least 
—20 dB lower than the original output for all 
analyzed signal levels. 

These outcomes and the corresponding mea¬ 
surement results of the other audio hardware 
modules are listed in Tab. 1 and 2. The identi¬ 
fied values should allow a high-quality capturing 
and playback of instruments. Analog amplifica¬ 
tion is always connected with the addition of 
noise. Therefore, the values of the input am¬ 
plifier decrease with an increase of gain. For 
achieving even better quality, the flexibility of 
the device allows for connection of almost any 
kind of music equipment, like favored guitar am¬ 
plifiers or vintage synthesizers. 

7 Conclusions 

The goal of this project was to create a stand¬ 
alone device, called the JamBerry, capable of 
delivering the well-known experience of a dis¬ 
tributed network performance in a user-friendly 
way. The device is based on the famous Rasp¬ 
berry Pi and is enhanced by several custom 
hardware extensions: a digital and an analog ex¬ 
tension board, providing high-quality audio in¬ 
terfaces to the Raspberry Pi, and a touchscreen 
to allow standalone operation of the device. 

The performance was evaluated under lab 
conditions and the authors assume that the sys¬ 
tem and especially the audio quality shall sat¬ 
isfy the need of most musicians. Besides the de¬ 
scribed device design proposal, the main author 
shares the ALSA kernel driver that is included 
in the Linux mainline kernel since version 3.14 
allowing the versatile connection of the Rasp¬ 
berry Pi with external audio hardware. 
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Abstract 

The idea is simple and obvious: Take some Rasp¬ 
berry Pi computing units, each as a reusable syn¬ 
thesizer module. Connect them via a network. Con¬ 
nect a notebook or PC to control and monitor them. 
Start playing on your virtual analog modular syn¬ 
thesizer. However, is existing Linux audio software 
sufficiently mature to implement this vision out of 
the box? We investigate how far we get in building 
such a synthesizer, what existing software to choose 
with focus on networking, analyse what limits we 
hit and what features still need to be implemented 
to make our vision become reality. 
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Figure 1: The Vision 


1 The Vision 

The popular Raspberry Pi (or, shortly, 
RPi ) [Raspberry Pi Foundation, 2014b] is a 
small, cheap, yet powerful, computing unit with 
many I/O jacks with Linux/ARMv6 available 
as operating system (OS). It is predestinated 
for building networks of collaborative modules, 
with each RPi taking over the role of a synthe¬ 
sizer module with dedicated function as e.g. os¬ 
cillator, filter or modulator. Using the RPi’s 
General Purpose I/O ( GPIO ) pins or SPI in¬ 
terface, only minimal circuitry is required to 
equip the RPi with knobs (e.g. potentiometers 
or rotary encoders) or sliders, preferably on a 
separate, tiny board, also called a shield. This 
way, you get a distributed user interface, with 
knobs and sliders located directly on the mod¬ 
ule that it controls. Modules can be added to 
or removed from the network in a hot-plugging 
manner. If, for a different setup, you need, say, 
more oscillators and less modulators, you may 
change the role of a module simply by changing 
the software that it runs. 

Compared with a virtual analog modular syn¬ 
thesizer running on a notebook or PC, the ap¬ 


proach of a network of RPis reveals several ad¬ 
vantages: 

• Dedicated System. The RPis are solely 
used for synthesis. The OS, residing on 
an SD card, can be tailored to this pur¬ 
pose. Many services are irrelevant for head¬ 
less mode or use in a synthesizer and thus 
need not be installed, thus saving space 
and CPU time. The Linux audio RPi 
page[Linuxaudio.org, 2014] lists many tips 
for tuning overall latency. Once running, 
infrequent software updates should suffice, 
such that the chance to break the installed 
software e.g. with incompatible libraries 
can be reduced. 

• Distributed computing. While the 
performance of the notebook or PC can 
easily become a bottleneck, in the dis¬ 
tributed network audio computing perfor¬ 
mance scales with the number of modules. 

• Distributed interface. With a single 
mouse and keyboard, you can control only 
a single input (such as a slider or knob) 
at one time. Optionally, the RPis can be 
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equipped with their own sliders and knobs, 
enabling to control them in parallel. Also, 
you may place the modules on, say, a large 
table, while on the virtual desktop of your 
notebook or PC, you are limited to size of 
your screen display. 

• Authenticity. In a live performance, it 
is more comprehensible for the audience to 
see a musician put hands on physically exis¬ 
tent modules and hear the resulting change 
of sound, rather than watching a PC user 
clicking and typing on his computer. 

Still, a notebook or PC maybe useful as host 
for controlling network connections between the 
modules. 

2 The Hardware 

Our vision just integrates existing software and 
hardware, one may think. In fact, we have to 
carefully choose software that smoothely inte¬ 
grates with our RPis. We look at the RPi’s 
hardware to better understand software require¬ 
ments. 

2.1 Audio Connections 

For building an audio network, we have to de¬ 
cide what interconnects to use for audio trans¬ 
mission. Essential criteria are: 

• Duplex operation. Synthesizer modules 
typically have both, input and output. We 
do not want to add hardware to gain full 
duplex operation. 

• Bandwidth. Bandwidth must be suffi¬ 
ciently high for carrying multiple channels. 

• Audio and control data. For low band¬ 
width data such as envelopes or frequency 
control, low bandwidth connections (e.g. 
MIDI) should be supported to save overall 
bandwidth. 

• Hardware protocol support. To save 
computing resources, low-level issues (e.g. 
parity check bits or serializing / deserializ¬ 
ing) should be implemented in hardware. 

The RPi’s hardware connectors capable of 
transmitting audio include the analog 3.5” au¬ 
dio output jack, USB, HDMI, GPIO / I2S, and 
Ethernet (Fig. 2). 



Figure 2: RPi Connectors Relevant for Audio 

2.1.1 Analog 3.5” Audio Output Jack 

Users report glitches, crackles and pops when 
using the 3.5” jack, at least in the early days of 
the RPi. It appearently supports only 11 bits of 
resolution [Linuxaudio.org, 2014], Most impor¬ 
tant, the RPi has no analog audio input. Analog 
audio can not feed back into the RPi without ad¬ 
ditional hardware; hence we do not persue this 
jack. 

2.1.2 USB 

Linux supports audio over USB. The RPi model 
B’s built-in USB 2.0 connectors only support 
connecting a USB device, but not another USB 
host [Raspberry Pi Foundation, 2012d]. Similar 
to the analog audio jack, symmetrical host-to- 
host transmission is impossible for RPi model B. 
In theory, RPi model A’s single USB port can 
make the RPi act as device, via a host-to-host 
USB cable, but there does not seem anyone on 
the Web having confirmed that the USB drivers 
support this mode. 

2.1.3 Audio via HDMI 

While audio can be transmitted via HDMI, the 
RPi’s built-in hardware does not support HDMI 
input; hence we do not persue this option. 

2.1.4 GPIO pins 

The RPi’s GPIO pins are ideal for low-level in¬ 
put and output of binary data. For streaming 
audio data over GPIO, we would have to im¬ 
plement a full protocol stack in software, thus 
consuming much computing resources and lim¬ 
iting bandwidth. Specific GPIO pins implement 
I2S with hardware support [Raspberry Pi Foun¬ 
dation, 2012c]. While this interface may provide 
sufficient performance (users report varying 
experiences on this issue[stackexchange.com, 
2013]), we would need special hardware (e.g. an 
I2S router). Some users claim that the kernel 
needs to be patched with an I2S kernel module 
to achieve high performance for audio data over 
I2S[GmbH, 2013]. At least, special I2S audio 
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drivers would have to be implemented to make 
audio applications aware of I2S. 

2.1.5 TCP/IP or UDP over Ethernet 

The RPi’s Ethernet plug can be used for trans¬ 
mitting audio data. Neither TCP/IP nor 
UDP have been designed for realtime applica¬ 
tions, but existing software for audio and video 
streaming over the internet shows that, with 
some effort, streaming is feasible as long as net¬ 
work bandwith is sufficiently high. We persue 
the approach of streaming audio over ethernet. 

3 The Software 

Preferring Ethernet for audio transmission, we 
next look into the software to choose and how to 
set it up. We prefer an out of the box solution 
of open source software. 

3.1 Choosing Proper Audio Streaming 
Software 

On Linux, there are competiting sound servers 
for streaming audio data over a network. In our 
short survey the following criteria are essential: 

• Availability. The software must be avail¬ 
able for ARMvG. Software, that is not open 
source or not under active development, is 
typically not available for this architecture. 

• Latency. In modular synthesis audio and 
control data typically follow a path run¬ 
ning through many modules. Therefore, 
the sound server should care for low la¬ 
tency. 

• Headless use. The RPis will be typi¬ 
cally controlled remotely and therefore run 
headlessly, that is without a display con¬ 
nected to them. No graphical environment 
such as XI1 or a window manager should 
be required to run. 

Audio streaming software like the Enlightened 
Sound Daemon (ESD) or Phonon are bound 
to a window manager. Therefore, ESD and 
Phonon are no valid candidates for our purpose. 
sndio is an audio server for OpenBSD, however, 
we are looking into a solution for Linux/ARM. 
The aRts sound server is out of development 
since 2006 and therefore not a viable choice. A 
quick internet search yields the following candi¬ 
dates: 

• Network Audio System (NAS) 

• PulseAudio 

• JACK with net JACK 


3.1.1 Network Audio System (NAS) 

The Network Audio System 

(NAS) [radscan.com, 1996 2013] does not 

(yet) list any ARM architecture as supported 
platform. The man page of NAS states that 
the server automatically converts all data to 
the designed format or rate, that is, resampling 
may slow down overall performance. 

3.1.2 PulseAudio 

PulseAudio supports streaming over net¬ 
works [freedesktop.org, 2013; archlinux.org, 
2012 2014]. However, latency seems to be 
a major problem; only recently, major im¬ 
provements have been announced [Lindner, 
2013]. Also, PulseAudio resamples all data 
into some internal format and again resamples 
it for delivery. The RPi’s limited computing 
performance should be saved for the actual 
audio processing of the synthesizer module’s 
function. 

3.1.3 JACK with netJACK 

In contrast to PulseAudio and NAS [jack de- 
vel@jackaudio.org, 2012], JACK[jackaudio.org, 
2006 2014] synchronizes all clients to one sound 
sink. JACK has been designed from the begin¬ 
ning with low latency in mind. Over the last 
few months, some remaining bugs that specif¬ 
ically appeared on the RPi/ARMv6 platform, 
have been fixed. We decide to pursue JACK, 
using (jackdmp) version 1.9.9. On our control 
host notebook, there was a pre-installed JACK 
(jackdmp) version 1.9.8 that we continue using. 
Any newer version should also work. 

3.1.4 netJACK1 vs. netJACK2 vs. 

JackTrip 

net JACK1 is available for both JACK1 and JACK2. 
In JACK1, net JACK1 is loaded with the command 
jackd -R -d net, while netJACK2 is not avail¬ 
able. In JACK2, netJACKl is loaded with jackd 
-R -d netone, while netJACK2 is loaded with 
jackd -R -d net. 

The graphical application qjackctl can 
be used to load and configure netJACKl or 
netJACK2 as backend. However, as of qjackctl 
version 0.3.9 bundled with current NOOBS, sev¬ 
eral bugs render qjackctl effectively useless 
when used with the netJACK2 backend on the 
RPi. Particularly, it uses netJACKl options such 
as -o4 instead of -P 4 for setting up 4 output 
channels, and exhibited problems to detect an 
already running JACK instance. We prefer to use 
the RPis in headless mode anyway, i.e. without 
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using qj ackctl. Running qj ackctl on the con¬ 
trol host seems to be fine for our purposes. 

The documentation of JackTripfCaceres and 
Chafe, 2010] explains how to setup a single 
JackTrip server with a single JackTrip client. 
The JackTrip server is a stand-alone applica¬ 
tion that, when started, appears as regular JACK 
client. When trying to start another JackTrip 
server instance, it complains that the associ¬ 
ated UDP socket is already in use. The sin¬ 
gle JackTrip server instance spawns only a sin¬ 
gle readable client in qj ackctl. The JackTrip 
documentation does not show any appearent 
way to connect multiple JackTrip clients. The 
latest ChangeLog entry of JackTrip dates from 
November 2010. As we need to connect multiple 
clients, we do not further persue JackTrip. 

3.2 Putting it All Together 

Next, we give a step by step instructions for 
installing and configuring all software for an 
RPi based modular synthesizer with JACK and 
netJACK. 

3.2.1 Setting up N00BS on the RPi 

By now, there is no distribution tailored for au¬ 
dio applications on the RPi. Instead, we use the 
New Out Of Box Software (N00BS) version 1.3.2 
(Debian 3.10.24+ #614 PREEMPT armv61 ker¬ 
nel) based on the Wheezy Raspian distribu¬ 
tion. We follow the instructions on the down¬ 
load webpage [Raspberry Pi Foundation, 2014a] 
and on the screen display, that we needed to at¬ 
tach to the RPi solely for the first installation, 
as well as a keyboard. Once one system is run¬ 
ning, no more display or keyboard is needed. 
The SD card’s contents can be copied to create 
another OS instance for another RPi. 

When asked by N00BS to select a distribution, 
we choose Raspian (i.e. Debian wheezy). With 
16GB class 4 SD card and 10 MBit/s internet 
connection, the following installation roughly 
takes 45 minutes, including several automatic 
reboots. Finally, the raspberry configuration 
tool raspi-config is executed. The preset de¬ 
faults should be fine. 

JACK including netJACK should be al¬ 
ready installed. For developing and com¬ 
piling JACK clients, you need to install C 
header Hies for JACK with the command sudo 
apt-get install libjack-jackd2-dev, that 
installs the packages libdbus-l-dev and 
libjack-jackd2-dev. If there is no DHCP 
server on your network, do not forget to stat¬ 
ically assign a unique IP address to each RPi 


and to set up a proper route to your network. 

Now we have basic N00BS installed on the RPi. 
Name for login on the RPi is pi, password is 

raspberry. 

3.2.2 Setting Up JACK on the RPi 

To automatically start a JACK slave instance on 
each RPi upon boot, put the following line into 
the /etc/rc. local script: 
sudo -u pi /usr/bin/jackd -R -d net -n 
module-name >/dev/null 2>&1 & 

The instance will then appear to the mas¬ 
ter JACK instance on the control host as a re¬ 
mote slave instance called module-name. Alter¬ 
natively, JACK can be started implicitly by the 
client application. Say, you have a low-pass fil¬ 
ter implementation that connects to JACK with 

jack_options_t options = JackNullOption; 
client = jack_client_open (client_name, 
options, &status, server_name); 

and your . jackdrc configuration file in your 
home directory containing the following line: 

/usr/bin/iackd -R -d net -n low-pass \ 
-C 3 -P 4 

Then starting your client application will also 
start JACK. That is, in your /etc/rc. local 
script, you can also directly launch your client 
application. 

3.2.3 Setting Up JACK on the Control 
Host 

On our notebook we use JACK 1.9.8. The JACK 
master instance is started with the command 
jackd -R -d alsa or with whatever backend 
else you prefer over ALSA. After that, we 
load netJACK with the command jack_load 
netmanager, such that all JACK slaves on the 
RPis may connect to the JACK master. 

Note that the JACK master on the control host 
must be started first. After that, boot the RPis. 
If JACK is automatically started on the RPi, 
then it will become visible to the control host. 
Use qj ackctl on the control host to connect all 
RPis’ inputs and outputs. Start playing your 
distributed RPi based modular synthesizer. 

3.3 Synthesizer Software 

Throughout this work, the author used very 
simple self-written JACK clients based on the 
simple_client. c example of the JACK distri¬ 
bution. In our out of the box spirit, we want 
to apply those existing LV 2 plugins[LV2, 2014] 
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for actual synthesis that do not require a GUI. 
Therefore we need a simple JACK client serving 
as host for LV 2 plugins that examines a given 
LV 2 plugin’s I/O lines and exports them as JACK 
channels. The author does not know of an exist¬ 
ing software doing this job, but the effort should 
not be too large. This work still has to be done. 

4 Advanced Module Identity and 
Identification 

For simplicity and ease of use, each RPi should 
represent exactly one synthesizer module. Then 
each RPi has a dedicated, clearly distinct func¬ 
tion, helping to keep clear oversight over the 
whole system. This approach looks like waste of 
computing resources, if, for example, one RPi is 
dedicated as an oscillator. However, even tasks 
looking as simple as an oscillator may evolve 
into high complexity when adding sophisticated 
input controls, for example for morphing be¬ 
tween sounds. We identify three options of iden¬ 
tification: remote setup, SD card based identity, 
shield based identity. 

4.1 Remote Setup 

The software on the RPi’s SD card contains all 
software for all supported module types, e.g. 
oscillator, low-pass filter or reverb effect. The 
function of a specific RPi is determined by re¬ 
motely configuring it on the control host. 

4.2 SD Card Based Identity 

As the RPi uses its SD card as resident mem¬ 
ory with complete OS and application software 
on it, the RPi gets a complete new identity by 
simply replacing the SD card. That is, we may 
create an oscillator SD card, a low-pass filter 
SD card, a reverb effect SD card, etc. The RPi 
represents a particular type of synthesizer mod¬ 
ule just by inserting the appropriate SD card. 
The RPi announces itself e.g. as oscillator or 
low-pass filter or reverb effect. The control host 
will collect all announcements and present all 
available modules to the user for wiring. 

4.3 Shield Based Identity 

For most modules, it is useful to provide hard¬ 
ware knobs or sliders directly attached to the 
modules, using a shield mounted directly on the 
RPi. While this approach requires (little) extra 
hardware and thus is not a pure out of the box 
solution, it has substantial advantages: 

• Visual module identification. The ex¬ 
tra hardware gives the RPi a visual iden¬ 
tity and emphasizes that RPi’s dedicated 


function. Each shield can be individually 
labelled (e.g. “master reverb effect”) and 
typically has a set of knobs or sliders that 
also may help identify its function. 

• Parallel control of hardware knobs 
and sliders. When controlling a virtual 
knob or slider on a screen, you need to 
place the mouse pointer on it. That is, 
only one input control can be used at a 
time. Switching between knobs or sliders 
takes time for relocating the mouse pointer. 
Hardware knobs and sliders enable parallel 
use and fast switching. 

• Module identity change by shield re¬ 
placement. If the shield provides an iden¬ 
tifier for the RPi that represents the mod¬ 
ule’s intended function (e.g. an LADSPA 
plugin ID), the RPi may automatically 
start any associated software or plug-in 
that implements the module’s function in¬ 
dicated by the shield. 

This way configuring an RPi as a dedicated 
module boils down to connecting a specific 
shield with knobs and sliders to it. The RPi’s 
SD card holds the software for any supported 
module, and when plugging in a shield, the RPi 
can determine which module to represent. 

4.4 Shield Design 

While the author has not (yet) developed a 
shield, the design idea is simple and straight¬ 
forward. We need circuitry connected to the 
RPi’s GPIO pins that converts input from ana¬ 
log controls like knobs or sliders into digi¬ 
tal signals. There are shields available with 
exactly this feature[abelectronics.co.uk, 2013; 
Raspberry Pi Foundation, 2012a; Modern De¬ 
vice, 2014], Even cheap A/D converters are suf¬ 
ficient for low frequency signals such as move¬ 
ments of knobs and sliders [Sklar, 2012] and 
accessible via SPI[Brownell and others, 2013; 
Gzamboni, 2013]. Rotary encoders can be di¬ 
rectly connected to the GPIO pins, as they 
simply consist of mechanical switches. The 
shield should contain a small serial EEPROM 
for uniquely identifying or describing the mod¬ 
ule’s function and maybe storing a user-defined 
module label. The label must be stored on 
the shield, not on the RPi’s SD card, as is 
names the module’s function as stamped by the 
shield, regardless of the particular underlying 
RPi. Maybe designing and producing proper 
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shields for our synth modules can emerge as can¬ 
didate for a tiny crowdfunding project. 

5 Evaluation 

Our attempt to set up a modular synthesizer 
using out of the box software shows remarkable 
limitations that should be considered as feature 
requests for the software that we discuss. 

5.1 Audio Data Routing 
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Figure 3: Routing via Master JACK Server 

The probably most obstructive issue is the 
central routing of all audio data via the mas¬ 
ter JACK instance. The netJACK2 approach as¬ 
sumes a single master instance [Stephane Letz 
et al., 2009]. Similarly, in net JACK 1 a slave 
can only be connected to one master at a time. 
The master’s backend (e.g. an ALSA sound de¬ 
vice) determines the sample rate and format for 
all commnunication with all participating JACK 
slave instances. To prevent the master becom¬ 
ing a bottleneck, we would prefer RPi hosting a 
slave and a master JACK instance. 

The qjackctl application follows the JACK 
and net JACK design and provides a GUI for con¬ 
figuring routing between the single master and 
multiple clients / slaves. If in a future version 
of JACK / net JACK multiple masters were sup¬ 
ported, we would like to have an extended ver¬ 
sion of qjackctl capable of managing connec¬ 
tions between two remote master instances. 

For evaluating the impact of central routing, 
we measured the master JACK CPU load as a 
function of the number of audio channel connec¬ 
tions. We connected three RPis to our notebook 
(Quad Core i5-2430M @ 2.4 GHz), used as con¬ 
trol host for running the master JACK server. 
One of the RPis served as array of n oscilla¬ 
tor outputs; the other two modules just looped 
through data from their n input channels to 
their n output channels. That is, for each chan¬ 
nel audio data flows from the oscillator to the 
notebook, then to the first loop-through module 


and back to the notebook, then to the second 
loop-through module and back to the notebook 
and finally to the ALSA backend device (Fig. 3). 
That is, for n channels, there are 6n connections 
configured in qjackctl. 

We measured the CPU load reported by 
jack_cpu_load() and varied the number of chan¬ 
nels per module. Each box plot shows the range 
of CPU load of 480 samples (lsample/sec x 
8min). For n > 18 (i.e. > 108 connections), se¬ 
vere problems like xrun errors and JACK crashes 
arised. Below this threshold, the system be¬ 
haved smoothly with moderate load (Fig. 4). 
For comparison, the overall CPU load shown 
by xosview kept below 0.7. 
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Figure 4: CPU Load on Master JACK Server 


qjackctl often unexpectedly exited (with 
exit code 0), when a remote client reappeared 
after reboot. Sometimes it did not recognize 
when the connection to a remote client broke 
down due to client shutdown (maybe due to 
some problem of netJACK2). The master JACK 
server crashed when trying to connect more 
than 21 channels to the ALSA backend. 

5.2 Labelling of Modules 

On our control host notebook running 
qjackctl, by default all JACK slaves ap¬ 
pear as clients with the name of the host they 
are running on. While this is reasonable default 
behaviour, we have a network of RPis with 
basically identical software setup. Therefore 
all RPis will appear as JACK clients labelled 
“raspberry” - the default host name for the 
Raspian Linux distribution. We could configure 
individual hostnames for each RPi, but it is 
the type of synthesizer module that should be 
displayed rather than a host identifier. Also, we 
do not want to change the host name each time 
the RPi changes the type of module. Luckily, 
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netJACK2 provides command line option -n to 
explicitly set the client name. For example, 
jackd -R -d net -n low-pass will result in 
a client called low-pass (Fig. 5). In netJACKl, 
there is no comparable option. 


o® Connections - JACK Audio Connection Kit 


Audio MIDI ALSA 

h 


Readable Clients/Output Port; ▼ 


Writable Clients / Input Ports ▼ 

v h§ low-pass 


▼ H] low-pass 

A from slave 1 


<£|6 to slave 1 

A from slave 2 


4k to slave 2 

A from slave 3 


4k to slave 3 

A from slave 4 


▼ Hi system 

v Hi system 


4k playback_1 

A capture_1 

A capture_2 


4k playback_2 


X Connect X Disconnect X Disconnect All ■€ Expand All O Refresh 


Figure 5: Slave started with jackd -R -d net 
-n low-pass -C 3 -P 4 

5.3 Number of Channels 

By default, a netJACK client is started with 
a number of audio input and output channels 
equal to that of the soundcard, i.e. mostly 2 for 
stereo sound. In contrast, a synthesizer module 
may have an arbitrary number of inputs and 
ouputs. Luckily again, the netJACK2 backend 
provides the command line options -C, -P, -i, 
and -o to specify the number of audio input, 
audio output, MIDI input, and MIDI output 
channels, respectively (Fig. 5). In netJACKl, 
the corresponding options are named -i, -o, -I, 
-0, respectively. 

5.4 Labelling of Channels 

The module software should be able to config¬ 
ure the names of a JACK slave’s audio and MIDI 
input and output channels individually. For ex¬ 
ample, an oscillator may have a pitch control in¬ 
put, a noise content control input, a left channel 
audio output and a right channel audio output. 

In JACK, port names can be set with the 
function j ack_port_set_name (j ack_port_t 
*port, const char *port_name). They are 
initially set to capturem or playbackm for 
inputs or outputs, respectively. When the 
function is executed on the RPi, it refers 
to the port name locally shown on the RPi. 
Unfortunately, netJACK does not report slave 
port names to the master. Instead, on the 
master JACK instance, remote client channels 
always appear as from_slave_n, to_slave_n, 
midi_from_slave_n, and midi_to_slave_n 
(Fig. 5). 


Of course, if software running on the con¬ 
trol host has knowledge about all synthesizer 
module types that may appear in the network, 
it may derive labels channels labels from the 
slave’s name. This workaround however does 
not qualify as out of the box solution. 

5.5 Boot Time 

Our N00BS based setup takes almost one minute 
of time for booting an RPi. For an embed¬ 
ded system that you want to immediately start 
working with, this time is far too long. There 
exist tailored kernels and software configura¬ 
tions for faster booting, reduced to a minimum 
of what is required for the dedicated purpose. 
Choosing a fast SD card may speed up boot¬ 
ing as well as using the kernel’s fastboot op¬ 
tion. RPi users report tips und tricks to re¬ 
duce its boot time to as low as less than 20 
seconds [Raspberry Pi Foundation, 2012b]. 

6 Conclusions 

Linux on Raspberry Pi is almost mature for im¬ 
plementing our vision of a modular synthesizer 
based on a network of RPis connected to a note¬ 
book or PC as control host. The most outstand¬ 
ing problem in our setup is that all audio and 
MIDI data is routed through the master JACK 
instance running on the control host. Instead, 
the RPis should be able to communicate directly 
with each other. This approach however would 
require multiple JACK master instances to be 
part of the communication network, while the 
netJACK architecture currently assumes a sin¬ 
gle master instance. 

As the RPis are run in headless mode, on the 
control host we would like to run an applica¬ 
tion capable of configuring the complete sys¬ 
tem. In particular, setting up direct connections 
between two RPis (once it will be supported) 
reaches beyond the scope of qjackctl. An ex¬ 
tended version of qjackctl could turn out es¬ 
sential for our vision. The overall stability of 
qjackctl should be improved. 

For using existing LV 2 plugins, there is miss¬ 
ing some JACK client serving as LV 2 plugin host. 

The author plans to get in contact with 
the authors of the depicted software to solve 
remaining issues. The example & testing 
code of this study is available at http: //www. 
soundpaint.org/rpi-modular-synth/. The 
author wants to thank the anonymous reviewer 
for pointing out some missing point for true out 
of the box spirit and a further reference. 
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Abstract 

A previously implemented realtime algorithmic com¬ 
position system with live coding interface had 
rhythm functions which produced stylistically lim¬ 
ited output and lacked flexibility. Through a cleaner 
separation between the generation of base rhythmic 
figures and the generation of variations at various 
rhythmic densities, flexibility was gained. These 
functions were generalized to make a greater variety 
of output possible. As examples, L-systems were im¬ 
plemented, as well as the use of ratios for generating 
variations at different rhythmic densities. This in¬ 
creased flexibility should enable the use of various 
standard algorithimic composition techniques and 
the development of new ones. 

Keywords 

algorithmic composition, live coding, Haskell, L- 
systems, rhythm 

1 Introduction 

A system for realtime algorithmic composition 
was first presented in (Bell, 2011) and then 
improvements were described in (Bell, 2013). 
The intention of that system was to be able 
to do realtime algorithmic composition, primar¬ 
ily through a live coding interface. It was con¬ 
cluded that while the interonset interval (IOI) 
function and density function provided in the 
Conductive library could yield somewhat useful 
results, improvements could be made. 

This paper describes attempted improve¬ 
ments in this area. First this paper briefly re¬ 
views how those functions were implemented 
previously and describes their output. It then 
explains the problems with that implementation 
and output. The paper then proceeds to de¬ 
scribe the newly implemented version and its 
advantages, namely the use of a higher-order 
function to gain a more modular system. Two 
example inputs to this higher-order function 
were implemented and are described. Finally, 
conclusions are made and directions for future 
research are proposed. 


2 Summary of Previous Rhythm 
Generation Technique 

2.1 A Brief Review of Conductive 

Conductive is a library for the Haskell program¬ 
ming language used for managing concurrent 
processes for realtime music. In addition to pro¬ 
viding functions for managing those concurrent 
processes, it has some features for representing 
musical time and for algorithmic composition. 

Concurrent musical processes are represented 
by a data structure called a Player. The Player 
refers to two functions: an action, which can 
be any 10 function, which it runs repeatedly; 
and an IOI function, which determines the wait 
times, called interonset intervals (IOIs), that are 
interleaved between calls to the action function. 
More information on Conductive and perform¬ 
ing with it can be found in (Bell, 2011) and 
(Bell, 2013). See Figure 1 for a graphical repre¬ 
sentation, originally included in (Bell, 2011). 


"play" runs 
IOI function 
and waits 


"play" gets Player 
action function 


play 

(a function) 


arguments 

7 r 

musical Player name 
environment 


"play" gets Player 
IOI function 


action is 
forked to 
new thread 


Figure 1: the Play loop, with Player, action 
function, and IOI function 


2.2 Rhythm in Conductive 

As described in (Bell, 2013), this author had 
been experimenting with reading IOI values 
from something called a density map. This 
concept involved two parts: the generation of 
rhythmic figures and the generation of an or¬ 
dered stack of variations at lesser and greater 
rhythmic density. Both of these parts used 
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stochastic methods and were joined finally in 
a single function. 

A higher-level abstraction was developed to 
generate a list and a set of rhythmically similar 
lists of greater and lesser density and store them 
in a table indexed by level of density. Doing so 
increased the likelihood that two lists would be 
perceived as having a rhythmic relationship and 
decreased the chance that an audience would 
perceive a kind of discontinuous state change 
when switching from one to another. 

The base rhythmic figure was an ordered list 
of IOI values expressed in terms of beats, whole 
or fractional. To generate it, a performer se¬ 
lected a core unit which was used to generate 
potential IOI values. Selection of the core unit, 
in conjunction with the length of the pattern, 
largely determined the metrical feel of the pat¬ 
tern. A list of scalars was determined by the 
performer, from which the function randomly 
selected a user-specified quantity to multiply 
with the core unit. The user specified a num¬ 
ber of subphrases to generate and the length of 
those phrases in terms of number of scalars to 
use. The subphrases were then generated by se¬ 
lecting the scalars and multiplying them by the 
base unit. Finally, a user-specified number of 
subphrases were chosen at random by the func¬ 
tion. The user determined the length of the 
final phrase in terms of beats. If the length of 
the concatenated subphrases did not equal the 
specified length, the final IOI value was padded. 
If the length exceeded the specified length, the 
final IOI value was truncated. The repetition 
of values and subphrases within the final figure 
tended to give it a musical quality lacking in a 
list of purely random numbers. For a complete 
example, see (Bell, 2013). 

Given a particular figure, a series of related 
patterns was generated in which the rhythmic 
density was increased or decreased. A large 
number of such patterns was generated in or¬ 
der so that a stack of patterns from very low 
rhythmic density to very high rhythmic density 
resulted, with the original figure somewhere in 
the middle. Those variations were generated in 
one of two ways, depending on whether they 
were to have greater or lesser density. When 
reducing density, one value from the figure was 
chosen at random and combined with a neigh¬ 
boring value. That new value was inserted into 
the figure in the place of those two selected val¬ 
ues. The less dense variation was then subjected 
to the same process recursively until the figure 


contained only a single item, with the figure at 
each step added to a list. For increasing density, 
a value was chosen at random and replaced with 
two items: an item of lesser value from a list of 
potential IOI values and the difference between 
the original IOI value and the lesser value. The 
resulting figure was subjected to the same pro¬ 
cess, again recursively and retaining each ver¬ 
sion, until the figure consisted of a list of the 
smallest of the potential IOIs. By concatenat¬ 
ing the original figure with the lists of greater 
and lesser density, a table was generated. For a 
complete example of this, see (Bell, 2013). 

An integer value representing the density 
ranking, with 0 being the least dense pattern, 
was assigned to each figure in the table. That 
table, along with the list of potential IOIs and 
the total length of the figure, was stored in a 
data structure called an IOIMap. 

The function call to execute this process looks 
like this, containing Ints, Doubles, and lists of 
each as arguments. The function returned an 
IOIMap containing the density map based on 
the generated rhythmic figure: 

mOO <- iOIandRTfromPhrase 0.25 2 
2 4 [2] 2 [2] 4.0 3 

Based on a user-specified density value, a par¬ 
ticular IOI pattern is chosen from the table. 
The user queries the table with a value between 
0 and 1, and a linear conversion to a list index 
is done. The value returned is the IOI pattern 
at that index. Based on the current beat, an 
IOI value is returned from that pattern. 

Density values can vary with time. One 
method for doing so is employing a Timespan- 
Map. A commonly observed pattern was set¬ 
ting the timing of particular values. It is often 
desired that values change over time but at dif¬ 
ferent rates. TimespanMaps are structures for 
handling such cases. Rather than specify the 
exact timing of a value, it specifies the range 
of time in which that value can occur. They 
are maps or dictionaries with intervals as keys 
to any kind of value. Another parameter of the 
structure is a specified length at which it loops. 
When a time is passed to the dictionary, the 
interval that time falls in is determined to be 
the key to use, and the corresponding value for 
that interval is returned. When the time value 
passed to the TimespanMap exceeds those for 
which it is defined, it loops to return an appro¬ 
priate value. For a graphical explanation, see 
Figure 2 in (Bell, 2013). 
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arguments previous lOIMap-generating function 


Ints — . 
Doubles 
lists of Ints 
lists of Doubles _ 


■ base pattern function 
density function. 


lOIMap 


generalized lOIMap-generating function 


Doubles ■ 
base pattern function 
density function _ 



lOIMap 


Figure 2: a comparison of the previous method and the new method 


2.3 Problems Identified in the Previous 
Paper 

The style of patterns produced was limited, 
both in the generation of rhythmic figures and 
their variations at various rhythmic densities. 
More inconvenient was the fact that other meth¬ 
ods could not be tested without rewriting the 
core functions. A more modular solution was 
sought. 

3 New Method: a Generalized 
Function for Patterns and 
Variations 

To solve the problem described above, it was 
determined to rewrite the function that han¬ 
dled density and base patterns, generalizing it 
to take functions and make the density func¬ 
tion a higher-order function. A higher order 
function is “a function that takes a function as 
an argument or returns a function as a result 
is called higher-order.” (Hutton, 2010) In this 
case it takes two functions as parameters: one 
for the generation of the base pattern and a sec¬ 
ond for determining how the density of an in¬ 
put should be increased. The function is passed 
those two functions and a parameter determin¬ 
ing how long the rhythmic figures should be. 
It uses the pattern function to produce a base 
pattern. It then processes the base pattern with 
the density function to create the rhythmic vari¬ 
ations. It returns an lOIMap, which includes a 
set of TimespanMaps mapping time intervals to 
values of next beats ordered according to their 
density value.The new function has been given 
the temporary name of newIOIMap2, to be used 
until a better method of naming it is deter¬ 
mined. 

The benefit of doing so is that the basic struc¬ 
ture is already available before a performance 
and does not need to be coded at that time or 
recoded when the current method for generat¬ 
ing patterns is longer useful. That means that 


generating patterns and then making a table of 
values which can be read from according to a 
density value can be accomplished more easily 
and in a greater variety of ways. John Hughes 
writes in his essay “Why Functional Program¬ 
ming Matters” that higher-order functions are 
one of two important kinds of “glue” that in¬ 
crease modularity, “the key to successful pro¬ 
gramming”. (Hughes, 1989) The use of higher- 
order functions in programming for aesthetic 
output has been described in (McDermott et 
al., 2010). 

As an example of an alternate method for in¬ 
creasing density, a function for increasing den¬ 
sity by ratio has been implemented. As an ex¬ 
ample of a function for generating base rhythms, 
two types of L-systems were implemented. 

3.1 Density by Ratio 

This method applies when generating variations 
of increased rhythmic density. 

The user specifies a list of ratios, a lowest 
target value, and a limit to how small the IOI 
values in the rhythmic figure can be. A value 
is selected at random from the rhythmic figure. 
A ratio is chosen at random from the list of 
ratios provided by the user. The ratio is ap¬ 
plied to the value, which is subtracted from the 
original value. These two new values are then 
shuffled and inserted into the rhythmic figure 
in place of the original value. The new rhyth¬ 
mic figure is stored in a list, and the process 
is repeated on this figure. This process is car¬ 
ried out recursively until all of the IOI values 
in the rhythmic figure are equal to or less than 
the user-specified lowest target value, producing 
a stack of increasingly dense rhythmic figures. 
The code for this procedure can be seen in the 
functions “densifier” and “densifier2”. 

Consider an example in which a list, [1,1,1,1] 
is progressively densified according to a ratio of 
0.5, with 0.25 being the lowest possible value. 
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let addL unit phraseLength ratios name = do 

ioimap <- newI0IMap2 0 phraseLength (generateDensities2 unit ratios) $ lsysTest4 
rs +<a (name,ioimap) 

Figure 3: an example of using the new higher order function, newIOIMap2 


In this example, “it” is the ghci reference to the 
output of the previous command. 

*> densifier 0.25 [0.5] [1,1,1,1] 

[1.0,1.0,1.0,0.5,0.5] 

*> densifier 0.25 [0.5] it 
[0.5,0.5,1.0,1.0,0.5,0.5] 

*> densifier 0.25 [0.5] it 
[0.5,0.5,1.0,0.5,0.5,0.5,0.5] 

*> densifier 0.25 [0.5] it 
[0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5] 

*> densifier 0.25 [0.5] it 
[0.25,0.25,0.5,0.5,0.5,0.5,0.5,0.5,0.5] 

This process would continue until all of the 
values in the list are equal to 0.25. 

3.2 Explanation of L-systems 

Lindemayer systems, abbreviated to L-systems, 
were developed by Aristid Lindenmayer in 1968 
as “a theory of growth models for filamentous 
organisms” (Lindenmayer, 1968). They have 
since been used by many for the simulation of 
plant growth, for visual art, and to a lesser ex¬ 
tent, music. 

They are string-rewriting systems in which an 
input string, called an axiom, is transformed ac¬ 
cording to a set of rules in which each item in the 
string (a predecessor symbol) is rewritten as a 
successor string. By inputting this output back 
through the rule-set, successive generations can 
be obtained (DuBois, 2003). 

Here is a small example of a rule set, input, 
and seven generations (Supper, 2001): 


rules: 

a -> b 


b -> ab 

input: 

a 

output 

: b 


ab 

bab 

abbab 

bababbab 

abb abb ab abb ab 

bababbababbabbababbab 

L-systems which do not have one-to-one 
string replacement rules grow in length rapidly 


as seen above and require users to employ tech¬ 
niques to deal with that size (DuBois, 2003). 

3.3 History of L-systems in 
Algorithmic Composition 

L-systems have been used in a variety of ways 
for algorithmic composition. Some examples 
from the literature are listed below. 

L-systems are frequently used for pitch con¬ 
tent. Supper describes the use of L-systems 
and cellular automata for algorithmic composi¬ 
tion (Supper, 2001). Langston used L-systems 
to choose from previously composed musical 
phrases (Langston, 1989). Morgan uses a sys¬ 
tem somewhat similar to Langston in which pre¬ 
viously generated pattern fragments are chosen 
from and assembled according to an L-system- 
generated template (Morgan, 2007). One of 
the most complete and useful discussions of us¬ 
ing L-systems for music is the dissertation by 
R. Luke Dubois, which describes various meth¬ 
ods for generating patterns of pitches in mono¬ 
phonic melody lines and chords. 

Worth and Stepney describe a set of L- 
system-selected rules by which note duration is 
progressively transformed (Worth and Stepney, 
2005). Use of L-systems for duration or rhythm 
are described by Kaliakatos-Papakostas, Floros, 
et. al., including the use of what they call 
FL-systems, in which L-system output is con¬ 
strained in length (Kaliakatsos-Papakostas et 
al., 2012). Kitani and Koike have also described 
a method of generating rhythms from L-systems 
in combination with a learning algorithm (Ki¬ 
tani and Koike, 2010). Liou, Wu, and Lee use L- 
systems to compute the complexity of rhythms 
(Liou et al., 2010). 

For more about L-systems, readers are re¬ 
ferred to the dissertations of Dubois (DuBois, 
2003) and Manousakis (Manousakis, 2006) first 
and then the other items listed above. 

3.4 The L-system Function 
Implemented for this System 

The initial intention for using L-systems with 
the higher-order function described above is to 
generate the base rhythms from which the den¬ 
sity table described above can be generated. 

The module itself contains functions for gen¬ 
erating a string output from an axiom, a rule 


52 



set, and the generation number. Using the map 
function, a set of several generations can be ob¬ 
tained. 

The rule set is notated with a colon rather 
than the traditional arrow for speed of entry. 
The predecessor symbol and successor string 
are written without spaces and separated by a 
colon. Each production rule must be separated 
by a space. The previous example can be run 
in ghci as follows: 

*LSystem> let rules = "a:b b:ab" 
*LSystem> getGeneration2 1 rules "a" 

"a" 

*LSystem> getGeneration2 2 rules "a" 

"b" 

*LSystem> getGeneration2 3 rules "a" 

"ab" 

*LSystem> getGeneration2 4 rules "a" 
"bab" 

*LSystem> getGeneration2 5 rules "a" 
"abbab" 

A more complicated example follows: 

rules: "a:ab b:acd d:gx e:abc f:ga g:d" 
axiom: "abcdefg" 

A symbol which has no rule is kept as-is. In 
is the same as if the rule were to repeat the 
symbol, such as “c -> c”. 

In this case, the output in the interpreter of 
the first three generations of this L-system are: 

*LSystem> getGeneration2 1 "a:ab b:acd 
d:gx e:abc f:ga g:d" "abcdefg" 

"abcdefg" 

*LSystem> getGeneration2 2 "a:ab b:acd 
d:gx e:abc f:ga g:d" "abcdefg" 
"abacdcgxabcgad" 

*LSystem> getGeneration2 3 "a:ab b:acd 
d:gx e:abc f:ga g:d" "abcdefg" 

"abacdabcgxcdxabacdcdabgx" 

Two methods have been tested: 

• direct output of IOI values 

• lists of value-transforming functions ap¬ 
plied in sequence to a base value 

The direct output of IOI values means that 
given an axiom, a rule set, the generation num¬ 
ber, and a list of potential IOI values, the func¬ 
tion will return a list of IOI values. How those 
IOI values are assigned to the symbols is a 


matter for which a large variety of options ex¬ 
ist. One simple choice is to randomly assign a 
value to each unique symbol. The string is then 
rewritten as that list of numeric values. This 
example illustrates such a method. The list of 
values ranges from 0.25 to 1.25, containing ev¬ 
ery step of 0.25. 

*LSystem> getGeneration2 5 rules "a" 
"abbab" 

*LSystem> let a = it 
*LSystem> randomFinalizer2 
[0.25,0.5..1.25] a 
[1.25,0.25,0.25,1.25,0.25] 

*LSystem> randomFinalizer2 
[0.25,0.5..1.25] a 
[1.25,1.0,1.0,1.25,1.0] 

*LSystem> randomFinalizer2 
[0.25,0.5..1.25] a 
[0.5,0.75,0.75,0.5,0.75] 

In the case of using transform, the rules of the 
L-system are mathematical functions that mod¬ 
ify a numerical value. First the output of the 
L-system is similarly rewritten with one math¬ 
ematical function randomly chosen for each 
unique symbol in the string. Given a starting 
value and that list of mathematical functions, 
the number is passed through the list so that 
the output of one function becomes the input 
of the next. The changes are accumulated so 
that each step in the transformation of the ini¬ 
tial number is kept. That series of numbers is 
then processed as deltas on which the density 
function will generate variations. Here is a very 
simple example of a list of functions processing 
a value: 

*> transform 2 [(2 + ),((—1) +),(3 *)] 
[2,4,3,9] 

Here is an application of using the transform 
function on the output of an L-system. 

*LSystem> getGeneration2 5 "a:b b:ab" 

"a" 

"abbab" 

*LSystem> let a = it 
*LSystem> let b = nub a 
*LSystem> b 
"ab" 

*LSystem> let c = zip (map (\x -> [x]) 
b) [(1+),(0.5*)] 

*LSystem> let d = flatFinalizer c a 
*LSystem> :t d 
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d :: [Double -> Double] 

*LSystem> transform 2 d 
[2.0,3.0,1.5,0.75,1.75,0.875] 

In this example, the variable “d” is the out¬ 
put of the function flatFinalizer, which converts 
the symbols in a string to their equivalents in 
a dictionary. In this case, the dictionary is “c”, 
which maps the output of the L-system, “b”, 
to the list of operations above. The ghci com¬ 
mand “:t” shows the type of something, and in 
this case is used to show that d is a list of func¬ 
tions which take a Double and return a Double. 
The nub function returns a list in which all du¬ 
plicate items have been removed. In this case, 
“d” would expand to: 

*LSystem> transform 2 [(1+),(0.5*), 
(0.5*),(1+),(0.5*)] 
[2.0,3.0,1.5,0.75,1.75,0.875] 

The final output in both of these examples, 
i.e. the list of Doubles, is used as a list of IOI 
values. Those values are then processed into a 
density map as described in sections 2 and 3.1. 

4 Conclusion 

An example of the function call now used to 
generate the IOIMap is shown below, includ¬ 
ing the partially applied function generateDen- 
sities2 and the function IsysTest4 can be found 
in Figure 3. 

A brief example of using the L-system- 
based method described above can be heard at 
this URL: http://renickbell.net/sound/renick- 
bell-fractal-beats-test-140209-b.mp3 

In this example, a collection of 100 density 
maps was created from a single L-System - 
“a:ab b:c c:abc d:ded e:aabb f:ga g:d” “abcdefg” 
- using the direct random selection of IOI values 
described above. Through the performance, 17 
of those are auditioned using a variety of audio 
sample sets as well as live modification of the 
envelopes which control the density level. 

Another brief example can be found at: 
http://renickbell.net/sound/renick-bell-fractal- 
beats-140125.mp3 

In the near future, this code should be 
cleaned and added to one of the Con¬ 
ductive packages at Hackage, the Haskell 
package repository. A rough version of the 
code can be found in the meantime at this 
URL: http://renickbell.net/code/generalized- 
density.zip 


With the modifications described above, the 
system certainly gained an additional degree of 
freedom. The system should now serve better as 
a platform for testing various algorithmic com¬ 
position techniques. That could include more 
complex L-systems, stochastic systems, and so 
on. 

Using ratios for increasing density works 
fairly well as long as the ratios are very sim¬ 
ple, like 0.5. Other ratios generate patterns 
that are likely less familiar to listeners, and thus 
might not be appropriate if the composer has 
the intention of producing music that neatly fits 
within most existing genres. However, this was 
just an example, and more sophisticated meth¬ 
ods can now be more easily tested. 

The use of L-systems is also interesting, but 
it will take additional practice to become ac¬ 
quainted with L-systems and how, in the middle 
of a performance, to write rules and axioms for 
interesting output. As with the density func¬ 
tion above, these L-system functions are also 
simple examples that can be refined or replaced 
for future work. They simply demonstrate that 
generalization was possible. 
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Abstract 

This paper suggests a modified dance music DJ per¬ 
formance, based on common DJing techniques en¬ 
riched with live coding moments either mixed with 
records or not, instead of only reproducing previous- 
made tracks. That way, all the different possibilities 
offered by live coding are put together with commer¬ 
cial tracks, promoting live coding while maintaining 
the dance music atmosphere and opening more im¬ 
provisation possibilities for the DJ. It is also a funny 
way to start learning programming and live coding. 
All software involved are open-source and the work- 
flow is based on the author’s. The primary intention 
here is to stimulate DJs to try live coding, at the 
same time helping to promote its sonority to audi¬ 
ences other than experimental music enthusiasts. 

Keywords 

live coding, DJing, live performance, education 

1 Introduction 

A DJ work consists of researching a lot to find 
the most suitable tracks for a specific venue and 
selecting a good order to play them as a set. 
Also important is the ability to mix, which con¬ 
sists of beat-matching 2 tracks and make a tran¬ 
sition from one to another, maintaining a con¬ 
tinuous flow instead of separately starting each 
track [Broughton and Brewster, 2006] [Collins 
et al., 2013]. Mixxx 1 [Andersen, 2003] is an 
open-source digital DJing software suited for 
that. 

Another type of live music performance is live 
coding, also called on-the-fly programming, in 
which the programmer/performer augments or 
modifies code while it is running and generat¬ 
ing real-time sound, without the need to stop 
or restart the program [Wang and Cook, 2004], 
A lot of languages are suited for this task, and 
one of the most popular is the open-source Su¬ 
perCollider 2 [McCartney, 2002], 

1 http://www.mixxx.org/ 

2 http://supercollider, sourceforge.net/ 


In this paper a workflow based on a mixture 
of DJing and live coding will be described. I 
believe that documenting this process, which is 
a very simple one but presents some technicali¬ 
ties, might benefit or stimulate seasoned DJs to 
incorporate new tools to their sets. At the same 
time I hope that it helps promoting live coding 
at mainstream venues and open-source software 
to more users. 

In sections 2 and 3 DJing and live coding 
practices will be detailed. Section 4 describes 
the live-coding-DJing method. Conclusions and 
perspectives are analyzed in section 5. 

2 DJing 

The term DJ comes from Disk Jockey, referring 
to the act of playing vinyl records for an au¬ 
dience. In a typical configuration, a DJ would 
work with two turntables and a mixer, so two 
songs could be reproduced simultaneously, an 
important issue to make the transitions. 

When CDs became available more and more 
people switched to the smaller and cheaper discs 
and CD Decks, and nowadays many DJs work 
only with a laptop computer. “(•••) If they 
lugged large boxes of records with them in the 
1990s, after the millennium a tendency was evi¬ 
dent toward lighter-weight luggage allowed by 
fully digital track management in hard-drive 
disk jockeying” [Collins et al., 2013]. 

No matter which media is chosen the DJ task 
is the same: playing records one after the other, 
mixing them, which consists of beat-matching 
(getting the tracks to play at the same tempo, 
with their beats synchronized [Broughton and 
Brewster, 2003]) and then making the transi¬ 
tion, which can be a blend (a gradual fade out 
of the first track and fade in of the second) or a 
cut (a sudden change of track being played). 

The beat-matching stage is typically done 
with a headphone: while the current song is re¬ 
produced for the audience, the next one is re¬ 
produced on the DJ’s headphones. Then the 
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tempo might be adjusted and after that the 
beats synchronized. Instead of using the head¬ 
phones, one could beat-match by looking at the 
tracks’ waveforms, assisted by tempo and beat 
detection software. There is also the more ex¬ 
treme possibility of doing it by ear dispensing 
headphones aid, mixing while both songs are re¬ 
produced to the audience. 

Some techniques [Broughton and Brewster, 
2003] might make the blend more interesting: 

• a simple blend using the faders: fade one 
track out and the other one in (fade dura¬ 
tions depend on the genres); 

• matching phrases: dance music tracks are 
divided in 4-bar phrases. Usually there are 
clues at the end of these phrases, e.g. a 
cymbal crash, extra drum beats or an in¬ 
strument finishing a solo. When mixing, it 
is important to match phrases and beats; 

• take advantage of keys, and avoid key 

clashes: some tracks sound bad when 

mixed because their notes are not in the 
same key. Some alternatives in this situa¬ 
tion are mixing when one (or both) is just 
percussion or pitch-shift one of them; 

• equalization can be used to hide parts of a 
song while keeping others: one example is 
removing the bass line from a track and in¬ 
troducing the bass line of a new song, then 
do the same with the other parts; 

• matching rhythms: some tracks fit to¬ 
gether better than others, because besides 
their harmonies and melodies match, their 
rhythms fit perfectly together. Difficulties 
arise when mixing two tracks with synco¬ 
pation, or too many drumbeats. 

Instead of blending a track with the subse¬ 
quent one a DJ may cut, which is switching 
sharply from one record to another without los¬ 
ing the beat. Cuts tend to sound better with 
sparse, percussive music, and bad with tracks 
containing continuous melodies [Broughton and 
Brewster, 2003]. Other alternatives are doing 
stops (stop current track and then start the next 
record after a while) or spin-backs (reverse cur¬ 
rent track and then cut to the next one). 

Some techniques were discussed but we shall 
keep in mind that, as emphasized by Broughton 
and Brewster, the most important thing about 
DJing is choosing the records and the order to 
play them, and after that the crucial decision 


is where to put the joins. “Where the mix 
occurs is more important than how it occurs” 
[Broughton and Brewster, 2003]. 

A DJ software (or any task-specific, applica¬ 
tion) ties the user to its paradigms. All actions, 
control structures, and interaction possibilities 
are defined in advance. With a rigid interface, 
they offer high visibility of available operations 
and immediate gestural control for the live per¬ 
former to adapt sound immediately and contin¬ 
uously, but with reduced potential for creative 
exploration [Blackwell and Collins, 2005]. 

Mixxx’s interface (Figure l) 3 , for example, 
contains buttons for loading tracks in 2 differ¬ 
ent virtual decks, playing, stopping and loop¬ 
ing them, setting cue points (important points 
in the track, likely to be replayed), knobs and 
faders for adjusting effects parameters, filters, 
tempo and playback rate for each track, among 
other functionalities, including a waveform vi- 
sualizer for the loaded tracks. 



Figure 1: Mixxx interface (Late Night Blues). 

3 Live-coding 

On the other hand comes the possibility of 
working with interpreted computer languages, 
modifying running algorithms on-the-fly and, in 
the case of audio programming, getting real¬ 
time sound as the program’s output. Brown 
[Brown, 2006] defines live coding as a practice 
where “digital content is created through com¬ 
puter programming as a performance”. 

The interface for live coding is only a text ed¬ 
itor 1 and a shell for feedback (although fancy 
IDEs are available), which means that all the 
actions are hidden in text commands. All the 
exploratory possibilities of computer languages 
are available for the performer, but no inter- 

3 picture from http://www.mixxx.org/press/ 

4 For text-based languages. Graphical programming 
languages, where programming is made linking objects 
in a canvas are also available, e.g. PureData. 
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face with buttons prepared for interaction are 
present. Therefore code must be written for all 
the actions involved in sound production, which 
can cause a high mental load specially in the 
scarce-tinre situation of a performance. 

Blackwell and Collins argument that besides 
aesthetic reasons a key concern for someone to 
choose the challenge of live coding instead of 
using a pre-made software is that these “are 
biased towards fixed audio products in estab¬ 
lished stylistic modes, rather than experimen¬ 
tal algorithmic music which requires the ex¬ 
ploratory design possibilities of full program¬ 
ming languages” [Blackwell and Collins, 2005]. 

Different approaches can be taken in a live 
coding session, from a low level and mathemat¬ 
ical one, writing complex algorithms that out¬ 
put sound as they evolve, to a more high level 
and simple approach, describing synthesis mod¬ 
els and then sequencing sounds by specifying 
values for the models’ parameters. These val¬ 
ues can result from routines evaluation or be 
directly chosen by the user. When working with 
this instrument plus sequencer paradigm, code 
can be viewed as a description of instruments 
and a score (a scheduling of musical events). 

This last approach represents in my opinion 
the easiest way to live code, provided the artist 
can describe a synthesis model, even if a sim¬ 
ple one. Programming notes for melodies/har- 
monies and rhythm patterns is simple, so it can 
perfectly be combined with dance music. This 
might be a naive approach for live coding, but it 
is enough to synthesize sounds within the LCD J 
paradigm to be proposed. 

3.1 The performance 

In live coding performances usually the code 
is projected for the audience. According to 
Brown, most people like it but some people 
find it a display of virtuosity and distraction 
to the listening experience [Brown, 2006]. Alex 
McLean, who plays raves and programmers 
gatherings said “I prefer it when the audience is 
dancing and doesn’t care how we’re making the 
music” [Andrews, 2006]. 

A live coding session may start either from 
scratch or with previous written code. In this 
last case, the artist may augment, modify or fill 
blanks. As time is scarce in a performance an 
interesting trick is to prepare snippets of code 
with the basic structure and syntax of the lan¬ 
guage (or even snippets with the main sounds 
and patterns of a personal production). 


An alternative approach for live coding is do¬ 
ing it in duos or larger groups, or even in an or¬ 
chestra together with musicians playing acous¬ 
tic instruments. That takes the pressure off a 
single performer and, in the case of a live cod¬ 
ing group, shares the algorithmic complexity be¬ 
tween the members [Collins, 2011]. 

4 Why not doing both? LCDJ! 

If someone is neither restricted to perform like a 
DJ nor like a live coder, and if someone can ac¬ 
cess tools for both occupations within the same 
computer, why not playing DJ styles of mu¬ 
sic adding some live coding moments, or why 
not live code supported by nice tracks playing 
along? Why not doing it if both tools are run¬ 
ning on the same sound server? 

In an interview, dance music duo Coldcut said 
“The future of DJing is not about whether vinyl 
will survive. The future of DJing is about me¬ 
dia mixing. The DJ with two SL1200s [turnta¬ 
bles] will fade out, but if he is clever, he’ll 
evolve into a multi-armed posse manipulating 
various sound and vision sources. There should 
be a new name for this, maybe a media-jockey” 
[Broughton and Brewster, 2003]. 

4.1 Suggestions for LCD Jing 

A LCDJ may start the performance playing a 
record or coding (releasing initial sounds after a 
while). In the second case the mood can be set 
according to improvisational decisions made ex¬ 
actly at the time of the performance, so a sound 
that perfectly invokes the intended atmosphere 
can be synthesized, instead of having to pick 
one from the finite set that is the hard drive. 

A LCDJ is able to jam with records via live 
code, inventing new sounds or mimicking/em¬ 
phasizing/satirizing the record’s. Stopping the 
track for a solo is also a good move. In the 
case of playing a personal production, different 
versions of it can be improvised by modifying 
code used to generate it (and that’s a good ap¬ 
peal to produce music using code); that can also 
be a way to tease the audience revealing pieces 
of the upcoming track, an effect similar to cut¬ 
ting back and forth between two beat-matched 
tracks [Broughton and Brewster, 2003]. 

Another interesting option is routing audio 
from Mixxx to SuperCollider, processing tracks 
with infinite possibilities of effects, instead of 
only applying some high-pass or low-pass filters 
or common effects like flanging or ring modula¬ 
tion (these are the options available in Mixxx’s 
and other similar programs interfaces). 


59 



In my experience, mixing is where LCDJ true 
potential is revealed. Instead of blending or cut¬ 
ting like a DJ, a live-coding-DJ may live code 
between tracks. A simple guide for that can be: 

1. choose the next song to play in the set and 
load it in Mixxx; 

2. choose an instant in the current song to 
start interacting; 

3. start live-coding and interact (in any way) 
with the current track; 

4. when current track ends, take the live- 
coding to a sonority suited to welcome the 
subsequent track; 

5. choose a moment and start the new track; 

6. when it is suited, stop live coding and let 
the new track fly. 

Step 4 can be made at any pace and some 
ways to welcome a track are by: 

• invoking its rhythm; 

• invoking its bass line, melody or harmony; 

• making a sparse and percussive sound, 
preparing for a cut; 

• making a totally non-sense sonority that 
brings tension to be released with next 
track (perfect for tracks with a sweet and 
melodic intro). 

Of course there is always the option of not wel¬ 
coming a track and live code for hours. 

The blending techniques mentioned earlier 
can be adapted for LCDJing. Some suggestions: 

• matching phrases: mimic a bass line or 
melody of current song and keep playing 
it for a while until the song vanishes. Then 
adapt it to an element present in next 
track, start it and interact for a while; 

• keys: start coding with current track, at 
the same key. When it ends, progressively 
add notes from another key, but without 
clashing. When the sonority has been 
taken to the same key as the next track, 
all is set for a good entrance; 

• equalization: a great chance to modify 
tracks. Cut some parts of the current track, 
for example, the bass line, and live code a 
new one. When suited, introduce a sonor¬ 
ity that resembles next track and call it; 


• match rhythms: be the drummer, along the 
current track, then solo, then with the new 
one. Link tracks using percussive lines. 

The techniques presented are the ones I could 
come up with and test in some occasions, but 
there is no limit for the possibilities in LCDJing, 
besides what one can do with code. Feeling the 
audience, the mood and the venue style are im¬ 
portant clues for how far from mainstream a 
LCDJ can go in a session. 

4.2 Software involved 

There are lots of nice applications for DJing 5 
and languages suited for live coding 6 . Depend¬ 
ing on personal choices any combination of soft¬ 
ware can be used for LCDJing. One could even 
dispense DJ software and LCDJ using only a 
language. In my experience Mixxx and Super- 
Collider is a good pair for Live-Coding-DJing 
performances because: 

• both are very efficient, so LCDJing is pos¬ 
sible even with a NetBook; 

• Mixxx is very easy to learn and provides 
all the tools a DJ need 7 , so common DJ 
actions can be readily done instead of hav¬ 
ing to code them; 

• although a first contact with SuperCollider 
might be frightening, its syntax makes it 
easy and fast to code synth models and se¬ 
quence patterns (all we need to LCDJ); 

• both connect to Jack 8 , which allows au¬ 
dio routing between software, so Mixxx 
and SuperCollider can communicate, send¬ 
ing and receiving audio to and from each 
other. Some advantages and possibilities 
have already been discussed; 

• both are open-source, with all the related 
advantages. 

4.3 Simple Mixxx, to collide with 
SuperCollider 

Both newcomers and artists used to other DJ 
applications will find it intuitive to work with 
Mixxx. Its interface is really simple and ev¬ 
erything necessary for LCDJ is available as a 
shortcut in the computer keyboard. A quick 

5 http: //linux-sound.org/ddj.html 

6 http: / / toplap.org/wiki/ToplapSystems 

7 http: / / www.mixxx.org/features / 

8 http: / /j ackaudio. org/ 
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read in the wiki 9 and one is familiarized. The 
community forums 10 are also good resources. 

In the author’s opinion a good practice for 
DJing, specially LCDJing, is to avoid mouse 
(time is scarce) and external controllers (save 
money and space in the cabin/backpack and 
make use of the laptop hardware as a whole). 

4.4 Simple SuperCollider, to mix with 
Mixxx 

As it was highlighted above, the live coding 
strategy proposed here is very simple; only in¬ 
strument definitions and a way to sequence 
sounds are necessary. For the instruments def¬ 
initions the class SynthDef is used. Its syntax 
is shown in Figure 2, with a simple sawtooth 
oscillator and an ADSR envelope being defined. 

SynthDef(\saw_ex, { 

|out=0, gate=l, pan=0, freq=100, 
a=0.1, d=0.2, s=0.8, r=0.3,//env args 
mix=0.5, room=0.8| //reverb arguments 
var env = EnvGen.kr(Env.adsr(a,d,s,r), 
doneAction:2, gate:gate); 
var snd = Saw.ar(freq); 
snd = snd * env; 
snd = Pan2.ar(snd, pan); 
snd = FreeVerb.ar(snd, mix, room); 

Out.ar(out, snd); 

}).add; 

Figure 2: Defining an instrument. 

The classes Pdef and Pbind might be used 
for the sequencing of sounds. The syntax is pre¬ 
sented in Figure 3, with a pattern that repeats 
the notes D,F,A (the degrees 1,3,5 are converted 
intro freq values) sequentially, along with values 
for each note duration and panning. Pseq picks 
the argument vector values in a sequence. 

Notice that each note attack time and reverb 
mix (dry=0/wet=l) will have a random value 
(Prand randomly picks values from the argu¬ 
ment vector), so the instrument sound will al¬ 
ways be changing. Models with more parame¬ 
ters can offer a wide timbre variation. 

The equivalent of this 30-line simple and flex¬ 
ible implementation would hardly (if even possi¬ 
ble) be attained in more rigid interfaces. Other 
options for sequencing and more complex syn¬ 
thesis models (and much more information) can 
be found in the learning SuperCollider page 11 
and SC community 12 . 

9 lit t p: / / www. mixxx .org/wiki/doku.php 
10 http://www.mixxx.org/forums/index.php 
11 http://supercollider.sourceforge.net/learning/ 
12 http://supercollider, sourceforge.net/community/ 


Pdef(\play_saw, 

Pbind( 

\instrument, \saw_ex, 

\degree, Pseq( [ 1,3,5 ] , inf), 
\dur, Pseq( [ 1,1,2 ] , inf), 

\a, Prand(0.1*[5,11,22,33], inf), 
\d, 0.5, \s, 0.95, \r, 0.15, 

\pan, Pseq([-1,1,-0.5,0.5], inf), 
\mix, Prand([0.2,0.5,0.9], inf), 
\room, 0.5 

) 

)-play; 

Figure 3: Specifying parameters values and se¬ 
quencing sounds. 

5 Conclusions 

DJing is a long-time established practice and 
live coding a not so old one but it certainly has 
already established its importance in contempo¬ 
rary music practice. The workflow described in 
this paper does not intend to dismiss such prac¬ 
tices, only mix them in a way that is simple for 
the seasoned DJ or anyone to try live coding 
and benefit. At the same time it is a funny and 
stimulating way to start programming, dive into 
synthesis studies and learn more about open- 
source software. Surely it only opens new pos¬ 
sibilities for the artist. 

Describing synthesis models, although a dif¬ 
ficulty task at first, is definitely worth. The 
sound palette of the producer will grow to the 
point that besides having personal production¬ 
s/tracks, a characteristic sound can also be 
achieved. In a world where most commercial 
electronic dance music sound so alike produc¬ 
ing tracks and timbres is a good way for promo¬ 
tion. Sharing snippets of code with nice sounds 
and/or patterns also seems to be a good idea. 

A pure live coding session aiming experimen¬ 
tal music requires much more than only these 
simple concepts presented here. With this ap¬ 
proach, however, a good level of interaction with 
dance music is possible because of its structure, 
which is usually rhythmic and well defined, with 
distinctive melodies and harmonies. 

Whatever the genre of electronic music the 
DJ wants to play, interaction with live coding 
is possible - from abstract Ambient sounds to 
the rhythmic beats of mainstream House - even 
with the simple paradigm described in this pa¬ 
per. Synthesis models can be as varied as the 
creativity/ability of the artist; the instruments 
can be sequenced with a fixed or (widely) vary- 
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ing timbre; the rhythm and notes patterns can 
also be freely specified, even randomly (Why 
not going for a track that wasn’t intended, just 
because a random pattern resembled it?). 

Although studio productions are not the in¬ 
tended output of a LCDJ session, extra care 
must be taken with the rawness of the synthe¬ 
sized audio. Mainstream dance music records 
are equalized, pre-mixed, well-balanced, com¬ 
pressed and mastered, so in order to fit in new 
sounds some sculpting is necessary, otherwise 
they get the foreground and mask the record. 
Usually, extra equalizing in the record plus a 
little reverb and good positioning with panning 
(that’s why they are in the example) in the live 
coding sounds are enough to find them a spot 
and prevent clashes. 

A true improvising door opens with LCDJ. 
Although DJs know specific tracks to invoke dif¬ 
ferent types of emotions, and DJing is based on 
improvisation according to the audience mood, 
the set of possibilities is finite, unless music is 
created on-the-fly. 

The same way that a performance in group 
relieves part of the pressure on each artist, live 
coding along records also has the same effect. 
More time is available to analyze and shape 
sounds, impose a rhythm and write code. 

Screen projection, although explored in pure 
live coding sessions, may be discarded in LCDJ. 
Code may be too simple, it would be a distrac¬ 
tion for dancers and a spoiler for the set. How¬ 
ever it depends on the venue, as more advanced 
programmers and specific audiences might like. 

Of course the practice is not restricted to 
Mixxx and SuperCollider. Great software and 
languages are available for DJing (xwax, termi- 
natorX, etc.) and live coding (ChucK, Pure- 
Data, etc.). However, Mixxx’s interface might 
be more familiar for seasoned DJs, especially 
those who work with turntables/decks or OSs 
other than Linux, and SuperCollider efficiency, 
along with Patterns Library - easy to learn and 
use - makes it a good option to start. 

LCDJing would also be possible dispensing 
the DJ software and using only a live coding 
language. However, a DJ application facilitates 
performing common DJ tasks (creation/man¬ 
agement of a playlist in the performance, ad¬ 
justing tempo with a knob twist, cueing points 
in tracks and scratching), relieving the mental 
load that coding every move would create. 

My impression on playing as a LCDJ is that 
people accept rhythmic live coding moments as 


unknown yet good track passages, when appro¬ 
priately presented and not overdone. More ab¬ 
stract coding moments brings tension and cu¬ 
riosity, which calls that magical record. 
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Abstract 

The authors offer an introductory walk-through of 
professional audio signal measurement and visuali¬ 
sation. 

The presentation focuses on the SiSco.Iv2 (Sim¬ 
ple Audio Signal Oscilloscope) and the Meters.Iv2 
(Audio Level Meters) LV2 plugins, which have been 
developed since August 2013. The plugin bundle 
is a super-set, built upon existing tools with added 
novel GUIs (e.g eburl28, jmeters,..), and features 
new meter-types and visualisations unprecedented 
on GNU/Linux (e.g. true-peak, phase-wheel,..). 
Various meter-types are demonstrated and the mo¬ 
tivation for using them explained. 

The accompanying documentation provides an 
overview of instrumentation tools and measurement 
standards in general, emphasising the requirement 
to provide a reliable and standardised way to mea¬ 
sure signals. 

The talk is aimed at developers who validate DSP 
during development, as well as sound-engineers who 
mix and master according to commercial constraints. 

Keywords 

Audio Level Metering, Visualisation, LV2, DSP 

1 Introduction 

Audio level meters are very powerful tools that 
are useful in every part of the production chain: 

• When tracking, meters are used to en¬ 
sure that input signals do not overload and 
maintain reasonable headroom. 

• Meters offer a quick visual indication of ac¬ 
tivity when working with a large number of 
tracks. 

• During mixing, meters provide a rough es¬ 
timate of the loudness of each track. 

• At the mastering stage, meters are used to 
check compliance with upstream level and 
loudness standards, and to optimise the dy¬ 
namic range for a given medium. 
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Similarly for technical engineers, reliable 
measurement tools are indispensable for the 
quality assurance of audio-effects or any pro¬ 
fessional audio-equipment. 

2 Meter Types and Standards 

For historical and commercial reasons various 
measurement standards exist. They fall into 
three basic categories: 

• Focus on medium: highlight digital num¬ 
ber, or analogue level constraints. 

• Focus on message: provide a general indi¬ 
cation of loudness as perceived by humans. 

• Focus on interoperability: strict specifi¬ 
cation for broadcast. 

For in-depth information about metering 
standards, their history and practical use, 
please see [Brixen, 2010] and [Watkinson, 2000]. 

2.1 Digital peak-meters 

A Digital Peak Meter (DPM) displays the abso¬ 
lute maximum signal of the raw samples in the 
PCM signal (for a given time). It is commonly 
used when tracking to make sure the recorded 
audio never clips. To that end, DPMs are 
calibrated to OdBFS (Decibels relative to Full 
Scale), or the maximum level that can be rep¬ 
resented digitally in a given system. This value 
has no musical connection whatsoever and de¬ 
pends only on the properties of the signal chain 
or target medium. There are conventions for 
fall-off-time and peak-hold, but no exact spec¬ 
ifications. Furthermore, DPMs operate on raw 
digital sample data which does not take inter¬ 
sample peaks into account, see section 2.7. 

2.2 RMS meters 

An RMS (Root Mean Square) type meter is an 
averaging meter that looks at the energy in the 
signal. It provides a general indication of loud¬ 
ness as perceived by humans. 
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VU BBC EBU DIN Nordic 

+20 - 



-50 - -68 

dBu IEC 60268-17 IEC 268-10IIB IEC 268-101 dBFS 

IEC 268-10 HA IEC 268-10 


Figure 1: Various meter alignment levels as spec¬ 
ified by the IEC. Common reference level is OdBu 
calibrated to -18dBFS for all types except for DIN, 
which aligns +9dBu to -9dBFS. dBu refers to voltage 
in an analogue system while dBFS to digital signal 
full-scale. 

Bar-graph RMS meters often include an ad¬ 
ditional DPM indicator for practical reasons. 
The latter shows medium specifics and gives an 
indication of the crest-factor (peak-to-average 
power ratio) when compared to the RMS me¬ 
ter. 

Similar to DPM’s, there is is no fixed stan¬ 
dard regarding ballistics and alignment level for 
a general RMS meter, but various conventions 
do exist, most notably the K-system introduced 
by Bob Katz [Katz, 2000]. 

2.3 IEC PPMs 

IEC (International Electrotechnical Commis¬ 
sion) type Peak Programme Meters (PPM) are 
a mix between DPMs and RMS meters, cre¬ 
ated mainly for the purpose of interoperability. 
Many national and institutional varieties exist: 
European Broadcasting Union (EBU), British 
Broadcasting Corporation (BBC), Deutsche 
Industrie-Norm (DIN),.. [Wikipedia, 2013]. 

These loudness and metering standards pro¬ 
vide a common point of reference which is used 
by broadcasters in particular so that the inter¬ 
change of material is uniform across their sphere 
of influence, regardless of the equipment used to 
play it back. See Fig. 1 for an overview of ref¬ 
erence levels. 


For home recording, there is no real need for 
this level of interoperability, and these meters 
are only strictly required when working in or 
with the broadcast industry. However, IEC- 
type meters have certain characteristics (rise¬ 
time, ballistics) that make them useful outside 
the context of broadcast. 

Their specification is very exact [IEC, 1991], 
and consequently, there are no customisable pa¬ 
rameters. 



Figure 2: Various meter-types from the meter.Iv2 
plugin bundle fed with a -18 dBFS 1 kHz sine wave. 
Note, bottom right depicts the stereo phase correla¬ 
tion meter of a mono signal. 

2.4 EBU R-128 

The European Broadcast Union recommenda¬ 
tion 128 is a rather new standard, that goes 
beyond the audio-levelling paradigm of PPMs. 

It is based on the ITU-R BS.1770 loudness al¬ 
gorithm [ITU, 2006] which defines a weighting 
filter amongst other details to deal with multi¬ 
channel loudness measurements. To differenti¬ 
ate it from level measurement the ITU and EBU 
introduced a new term ‘LU’ (Loudness Unit) 
equivalent to one Decibel 1 . The term ‘LUFS’ is 
then used to indicate Loudness Unit relative to 
full scale. 

In addition to the average loudness of a pro¬ 
gramme the EBU recommends that the ‘Loud¬ 
ness Range’ and ‘Maximum True Peak Level’ 
be measured and used for the normalisation of 
audio signals [EBU, 2010]. 

The ITU specs uses ‘LKFS’, Loudness using the K- 
Filter, with respect to to Full Scale, which is exactly 
identical to ’LUFS’. 
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The target level for audio is defined as -23 
LUFS and the maximum permitted true-peak 
level of a programme during production shall 
be -1 dBTP. 

The integrated loudness measurement is in¬ 
tended to quantify the average program loud¬ 
ness over an extended period of time, usually a 
complete song or an entire spoken-word feature. 
[Adriaensen, 2011], [EBU, 2011]. 

Many implementations go beyond displaying 
range and include a history and histogram of 
the Loudness Range in the visual readout. This 
addition comes at no extra cost because the al¬ 
gorithm to calculate the range mandates keep¬ 
ing track of a signal’s history to some extent. 

Three types of response should be provided 
by a loudness meter conforming to R-128: 

• Momentary response. The mean squared 
level over a window of 400ms. 

• Short term response. The average over 3 
seconds. 

• Integrated response. An average over an 
extended period. 



sxx \tuiiii niii„ t 
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Figure 3: EBU R-128 meter GUI with histogram 
(left) and history (right) view. 

2.5 VU meters 

Volume Unit (VU) meters are the dinosaurs 
(1939) amongst meters. 

The VU-meter (intentionally) ” slows” mea¬ 
surement, averaging out peaks and troughs of 
short duration, and reflects more the perceived 
loudness of the material [Wikipedia, 2014], and 
as such was intended to help program produc¬ 
ers create consistent loudness amongst broad¬ 
cast program elements. 

In contrast to all the previously mentioned 
types, VU metes use a linear scale (in 1939 


logarithmic amplifiers were physically large). 
The meter’s designers assumed that a record¬ 
ing medium with at least 10 dB headroom over 
0 VU would be used and the ballistics were de¬ 
signed to “look good” with the spoken word. 

Their specification is very strict (300ms rise- 
time, 1 - 1.5% overshoot, flat frequency re¬ 
sponse), but various national conventions ex¬ 
ist for the 0VU alignment reference level. The 
most commonly used was standardised in 1942 
in ASA C16-5-1942: “The reading shall be 0 
VU for an AC voltage equal to 1.228 Volts RMS 
across a 600 Ohm resistance” 2 

2.6 Phase Meters 

A phase-meter shows the amount of phase dif¬ 
ference in a pair of correlated signals. It al¬ 
lows the sound technician to adjust for opti¬ 
mal stereo and to diagnose mistakes such as 
an inverted signal. Furthermore it provides 
an indication of mono-compatibility, and pos¬ 
sible phase-cancellation that takes place when 
a stereo-signal is mixed down to mono. 

2.6.1 Stereo Phase Correlation Meters 

Stereo Phase Correlation Meters are usually 
needle style meters, showing the phase from 0 to 
180 degrees. There is no distinction between 90 
and 270 degree phase-shifts since they produce 
the same amount of phase cancellation. The 0 
point is sometimes labelled “+1”, and the 180 
degree out-of-phase point “-1”. 

2.6.2 Goniometer 

A Goniometer plots the signal on a two- 
dimensional area so that the correlation be¬ 
tween the two audio channels becomes visually 
apparent (example in Fig. 8). The principle is 
also known as Lissajous curves or X-Y mode in 
oscilloscopes. The goniometer proves useful be¬ 
cause it provides very dense information in an 
analogue and surprisingly intuitive form: From 
the display, one can get a good feel for the au¬ 
dio levels for each channel, the amount of stereo 
and its compatibility as a mono signal, even to 
some degree what frequencies are contained in 
the signal. Experts may even be able to deter¬ 
mine the probable arrangement of microphones 
when the signal was recorded. 

2.6.3 Phase/Frequency Wheel 

The Phase Wheel is an extrapolation of the 
Phase Meter. It displays the full 360 degree 
signal phase and separates the signal phase by 

2 This corresponds to +4dBu 
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Figure 4: Phase/Frequency Wheel. Left: pink 
noise, 48KSPS with right-channel delayed by 5 sam¬ 
ples relative to left channel. Right: Digitalisation 
of a mono 1/2” tape reel with slight head misalign¬ 
ment. 

frequency. It is a rather technical tool useful, 
for example, for aligning tape heads, see Fig. 4 

2.7 Digital True-Peak Meters 

A True-Peak Meter is a digital peak meter 
with additional data pre-processing. The audio¬ 
signal is up-sampled (usually by a factor of four 
[ITU, 2006]) to take inter-sample peaks into ac¬ 
count. Even though the DPM uses an identi¬ 
cal scale, true-peak meters use the unit dBTP 
(decibels relative to full scale, measured as a 
true-peak value - instead of dBFS). dBTP is 
identical to dBFS except that it may be larger 
than zero (full-scale) to indicate peaks. 

Inter-sample peaks are not a problem while 
remaining in the digital domain, they can how¬ 
ever introduce clipping artefacts or distortion 
once the signal is converted back to an analogue 
signal. 


floating point 
audio data 

mathematical 
true peak value 

. . 0 0 +1 +1 0 0.. 

+2.0982 dBTP 

. . 0 0 +1 -1 0 0.. 

+0.7655 dBTP 


Table 1: True Peak calculations @ 44.1 KSPS, both 
examples correspond to OdBFS. 

Fig. 5 illustrates the issue. Inter-sample 
peaks are one of the important factors that ne¬ 
cessitate the existence and usage of headroom 
in the various standards, Table 1 provides a few 
examples of where traditional meters will fail to 
detect clipping of the analogue signal. 

2.8 Spectrum Analysers 

Spectrum analysers measure the magnitude of 
an input signal versus frequency. By analysing 
the spectra of electrical signals, dominant fre¬ 
quency, power, distortion, harmonics, band- 



Figure 5: Inter-sample peaks in a sine-wave. The 
red line (top and bottom) indicates the digital peak, 
the actual analogue sine-wave (black) corresponding 
to the sampled data (blue dot) exceeds this level. 
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Figure 6: 30 Band 1/3 octave spectrum analyser 

width, and other spectral components can be 
observed. These are not easily detectable in 
time domain waveforms. 

Traditionally they are a combination of band¬ 
pass filters and an RMS signal level meter per 
band which measures the signal-power for a dis¬ 
crete frequency band of the spectrum. This is 
a simple form of a perceptual meter. A well 
known specification is a 1/3 octave 30-band 
spectrum analyser standardised in IEC 61260 
[IEC, 1995]. Frequency bands are spaced by 
octave which provides a flat readout for a pink- 
noise power spectrum, which is not unlike the 
human ear. 

As with all IEC standards the specifications 
are very precise, yet within IEC61260 a num¬ 
ber of variants are available to trade off imple¬ 
mentation details. Three classes of quality are 
defined which differ in the filter-band attenu¬ 
ation (band overlap). Class 0 being the best, 
class 2 the worst acceptable. Furthermore two 
variants are offered regarding filter-frequency 
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bands, base ten: lOw and base two: 23. The 
centre frequency in either case is lKHz, with (at 
least) 13 bands above and 16 bands below. 

In the digital domain various alternative im¬ 
plementations are possible, most notably FFT 
and signal convolution approaches 3 . 

FFT (Fast Fourier Transform, an implemen¬ 
tation of the discrete Fourier transform) trans¬ 
forms an audio signal from the time into the 
frequency domain. In the basic common form 
frequency bands are equally spaced and oper¬ 
ation mode produces a flat response for white 
noise. 

For musical applications a variant called ‘per¬ 
ceptual analysers’ is widespread. The signal 
level or power is weighted depending on var¬ 
ious factors. Perceptual analysers often fea¬ 
ture averaging functions or make use of screen- 
persistence to improve readability. They also 
come with additional features such as numeric 
readout for average noise level and peak detec¬ 
tion to mitigate effects introduced by variation 
in the actual display. 

2.9 Oscilloscopes 

The oscilloscope is the “jack of all trades” of 
electronic instrumentation tools. It produces a 
two-dimensional plot of one or more signals as 
a function of time. 

It differs from a casual wave-form display, 
which is often found in audio-applications, in 
various subtle but important details: An oscil¬ 
loscope allows reliable signal measurement and 
numeric readout. Digital wave-form displays on 
the other hand are operating on audio-samples - 
as opposed to a continuous audio-signal. Figure 
7 illustrates this. 

For an oscilloscope to be useful for engineer¬ 
ing work it must be calibrated - for both time 
and level, be able to produce an accurate read¬ 
out of at least two channels and facilitate signal 
acquisition of particular events (triggering, sig¬ 
nal history) [Adriaensen, 2013]. 

3 Standardisation 

The key point of measuring things is to be able 
to meaningfully compare readings from one me¬ 
ter to another or to a mathematically calcu¬ 
lated value. A useful analogy here is inches and 
centimetres, there is a rigorous specification of 

3 There are analogue designs to perform DFT tech¬ 
niques, but for all practical purposes they are inadequate 
and not comparable to digital signal processing. 


Sinnle Scope (Mono) 



Figure 7: 15KHz, -3dBFS sine wave sampled at 
48KSPS. The Oscilloscope (top) up-samples the 
data to reproduce the signal. The wave-form dis¬ 
play (bottom) displays raw sample data. 

what distance means. There are various stan¬ 
dards and conventions, but there is no margin 
for error: One can rely on the centimetre. 

Unfortunately the same rigour is not always 
applied to audio metering. On many products 
the included level meter mainly serves to en¬ 
hance aesthetics, “make it look cool”, rather 
than provide a reliable measurement. This 
trend increased with the proliferation of digital 
audio plugins. Those meters are not completely 
without merit, they can be useful to distinguish 
the presence, or otherwise, of a signal, and most 
will place the signal-level in the right ballpark. 
There is nothing wrong with saying “the build¬ 
ing is tall” but to say “the building is 324.1m 
high” is more meaningful. The problem in the 
audio-world is that many vendors add false nu¬ 
meric labels to the scale to convey the look of 
professionalism, which can be quite misleading. 

In the audio sphere the most prominent 
standards are the IEC and ITU specifications: 
These specs are designed such that all meters 
which are compliant, even when using com¬ 
pletely different implementations, will produce 
identical results. 

The fundamental attributes that are specified 
for all meter types are: 

• Alignment or Reference Level and Range 

• Ballistics (rise/fall times, peak-hold, burst 
response) 

• Frequency Response (filtering) 
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Standards (such as IEC, ITU, EBU,...) gov¬ 
ern many details beyond that, from visual 
colour indication to operating temperatures, 
analogue characteristics, electrical safety guide¬ 
lines, test-methods, down to electrostatic and 
magnetic interference robustness requirements. 

4 Software Implementation 
4.1 Meters.lv2 

Meters.Iv2 [Gareus, 2013a] is a set of audio plug¬ 
ins, licensed in terms of the GPLv2 [GPL, 1991], 
to provide professional audio-signal measure¬ 
ments according to various standards. It cur¬ 
rently features needle style meters (mono and 
stereo variants) of the following 

• IEC 60268-10 Type I / DIN 

• IEC 60268-10 Type I / Nordic 

• IEC 60268-10 Type Ha / BBC 

• IEC 60268-10 Type lib / EBU 

• IEC 60268-17 / VU 

An overview is given in Fig. 2. Furthermore 
it includes meter-types with various appropriate 
visualisations for: 

• 30 Band 1/3 octave spectrum analyser ac¬ 
cording to IEC 61260 (see Fig. 6) 

• Digital True-Peak Meter (4x Oversam¬ 
pling), Type II rise-time, 13.3dB/s falloff. 

• EBU R128 Meter with Histogram and His¬ 
tory (Fig. 3) 

• K/RMS meter, K-20, K-14 and K-12 vari¬ 
ants 

• Stereo Phase Correlation Meter (Needle 
Display, bottom right in Fig. 2) 

• Goniometer (Stereo Phase Scope) (Fig. 8) 

• Phase/Frequency Wheel (Fig. 4) 

There is no official standard for the Goniome¬ 
ter and Phase-Wheel, the display has been eye- 
matched by experienced sound engineers to fol¬ 
low similar corresponding hardware equivalents. 

Particular care has been taken to make the 
given software implementation safe for profes¬ 
sional use. Specifically real-time safety and ro¬ 
bustness (e.g. protection against denormals or 
subnormal input). The graphical display makes 
use of hardware acceleration (openGL) to min¬ 
imise CPU usage. 



Figure 8: Goniometer (Phase Scope) 

4.2 Sisco.Iv2 

Sisco.LV2 [Gareus, 2013c] implements a classic 
audio oscilloscope with variable time scale, trig¬ 
gering, cursors and numeric readout in LV2 plu¬ 
gin format. While it is feature complete for an 
audio-scope, it is rather simplistic compared to 
contemporary hardware oscilloscopes or similar 
endeavours by other authors [Adriaensen, 2013]. 

The minimum grid resolution is 50 micro¬ 
seconds - or a 32 times oversampled signal. The 
maximum buffer-time is 15 seconds. Currently 
variants up to four channels are available. 

The time-scale setting is the only parameter 
that directly affects data acquisition. All other 
parameters act on the display of the data only. 
The vertical axis displays floating-point audio¬ 
sample values with the unit [-1..+1]. The am¬ 
plitude can be scaled by a factor of [-10..+10] 
(20dB), negative values will invert the polarity 
of the signal. The numeric readout is not af¬ 
fected by amplitude scaling. Channels can be 
offset horizontally and vertically. The offset ap¬ 
plies to the display only and does not span mul¬ 
tiple buffers (the data does not extend beyond 
the original display). This allows the display to 
be adjusted in ‘paused’ mode after sampling a 
signal. 

The oscilloscope allows for visually hiding 
channels as well as freezing the current display 
buffer of each channel individually. Regardless 
of display, data-acquisition for every channel 
continues and the channel can be used for trig¬ 
gering. 
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Figure 9: Overview of trigger preprocessor modes 
available in “mixtri.lv2”. The arrow indicates trig¬ 
ger position. 


# 

Title 

Description 

1 

Signal Edge 

Signal passes ‘Level F 

2 

Enter Window 

Signal enters a given 
range (Level 1, 2). 

3 

Leave Window 

Signal leaves a given 
range (Level 1, 2). 

4 

Hysteresis 

Signal crosses both min 
and max (Level 1,2) in 
the same direction with¬ 
out interruption. 

5 

Constrained 

Signal remains within a 
give range for at least 
‘Time P. 

6 

Drop-out 

Signal does not pass 
through a given range 
for at least ‘Time 1’. 

7 

Pulse Width 

Last edge-trigger oc¬ 
curred between min and 
max (Time 1,2) ago. 

8 

Pulse Train 

No edge-trigger for a 
give time (max, Time 
2), or more than one 
trigger since a give time 
(min, Time 1). 

9 

Runt 

Fire if signal crosses 1st 
but not 2nd threshold. 

10 

LTC 

Trigger on Linear Time 
Code sync word. 

11 

RMS 

Calculate RMS, Inte¬ 
grate over ‘Time 1’ sam¬ 
ples. 

12 

LPF 

Low Pass Filter, 1.0/ 
‘Time 1’ Hz 


Table 2: Description of trigger modes in Fig. 9. 

The scope has three modes of operation: 

• No Triggering The Scope runs free, with 
the display update-frequency depending on 
audio-buffer-size and selected time-scale. 
For update-frequencies less than 10Hz a 
vertical bar of the current acquisition posi¬ 
tion is displayed. This bar separates recent 
data (to the left) and previously acquired 
data (to the right). 

• Single Sweep Manually trigger acquisi¬ 
tion using the push-button, honouring trig¬ 
ger settings. Acquires exactly one complete 
display buffer. 

• Continuous Triggering Continuously 
triggered data acquisition with a fixed hold 
time between runs. 
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Advanced trigger modes are not directly in¬ 
cluded with the scope, but implemented as 
a standalone “trigger preprocessor” [Gareus, 
2013b] plugin, see Fig. 9 and Table 2. Trigger- 
modes 1-5 concern analogue operation modes, 
modes 6-9 are concerned with measuring digital 
signals 4 . Modes 10-12 are pre-processor modes 
rather than trigger modes. Apart from trigger 
and edge-mode selectors “mixtri.lv2” provides 
two level and two time control inputs for con¬ 
figuration. 

5 Conclusion 

An overview of various instrumentation tools 
and measurement standards was presented. 
The various tools are available as free soft¬ 
ware and have already found their way into 
GNU/Linux distributions, making Linux even 
more suitable as a platform for Pro-Audio work. 
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Abstract 

This paper discusses the OpenAV [1] release system, 
a new release system with at its core a balance be¬ 
tween release date and financial support. 

The release system works by creating the software, 
announcing it, and releasing after a waiting time. If 
money is donated to the project, the waiting time 
is reduced, which in turn results in an accelerated 
release. 

This paper details the process of the OpenAV re¬ 
lease system, discusses it in relation to other release 
systems. Finally the author draws on the experience 
gained by OpenAV Productions. 

Keywords 

Open-source, Funding, Software Release, OpenAV 
Productions 

1 Introduction 

This paper introduces the OpenAV release sys¬ 
tem, a release system which is designed to fi¬ 
nancially support the developer of a software 
project, while also ensuring that every project 
is released in source-code form. 

Developers of open-source software often can¬ 
not work full time on a project due to financial 
constraints: they must earn money elsewhere in 
order to pay the bills. The OpenAV release sys¬ 
tem is designed to financially support a devel¬ 
oper while working on an open source project. 

The main components of the release system 
include a waiting time, a target amount for 
funding, and waiting before releasing source 
code. These components represent a balance, 
where both financial support and time passing 
contribute towards releasing the source code. 

The outcome is always the same: the source 
code is released, the variable is how much money 
the developer received for their effort. 

2 Background 

In order to compare the OpenAV release sys¬ 
tem to existing funding and release-systems, a 
selection of well known crowd-funding projects 


are introduced below. Each section has a short 
introduction of the platform itself, and a de¬ 
scription of its unique features. 

2.1 Kickstarter 

Kickstarter is a funding platform where projects 
are advertised, and can be donated to by mem¬ 
bers of the public. Project proposals are posted, 
usually with a video and blog post to gain mo¬ 
mentum for the idea. The funding model is 
one of proposing an idea, and then attempting 
to collect the full amount of money: “Fund¬ 
ing on Kickstarter is all-or-nothing - projects 
must reach their funding goals to receive any 
money” [2]. 

2.2 OpenInitiative.com 

Openlnitiative use a pay-per-item model where 
developers suggest work on a project or feature, 
and then users can contribute to each feature or 
project in order to have the work done. 

The unique feature is that “the developer de¬ 
termines the delivery date when the project is 
finished. Users then have 14 days to validate 
the result or request corrections. The developer 
is paid only after validation by the users” [3]. 

2.3 Snowdrift.coop 

Snowdrift.coop is a new method of funding 
projects, where contributors pay more money 
depending on how many others contribute to 
the same goal: “I’ll donate more if more people 
join me” [4], 

This leads to a funding model that grows 
along with the projects it supports, making it 
sustainable for long-term funding. 

2.4 Subscriptions and Donations 

Allowing donations and/or subscriptions may 
provide financial support for a developer. Quan¬ 
tities of donations will vary depending on the 
amount of users directly benefiting from a 
project. 
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The motivation for donating or subscribing 
to a project are often driven by morals: gen¬ 
erally there is no promise of a direct change in 
releasing based on donations. 

3 OpenAV Release System 

The OpenAV release system is a release system 
geared to provide software to the linux-audio 
community, while also financially rewarding the 
developer for time spent developing software. 

The unique feature of this release system is 
that a release is not achieved by donating a fixed 
amount of money: instead a tradeoff between 
time and money dictates when the release oc¬ 
curs. 

3.1 Design decisions 

When OpenAV Productions was set up, the au¬ 
thor researched how it could be financially sup¬ 
ported while also releasing open source code. 

It became clear that a new funding and re¬ 
lease system could be more appropriate for 
developing and financially supporting projects 
than the existing solutions (e.g. Kickstarter). 

The concept of setting a trade-off between 
release time and financial support became the 
core of the OpenAV release system. The re¬ 
lease date is financially supported, instead of 
the product. 

The developer commits to making an open 
source release: regardless of financial support, 
which ensures that the work done will become 
available to the commons. At the same time, an 
initiative exists to financially support the devel¬ 
oper for a project, as the release of the code will 
be accelerated when money is donated. 

3.2 Procedure 

The stages of the release process are presented, 
after which each stage is detailed. 

• Creation: the project is developed to a 1.0 
degree of features and testing. 

• Announcement: demonstrates the project, 
what it’s purpose is, and how to use it. 

• Releasing: the projects source code is made 
available. 

3.2.1 Creation 

In the first stage of the OpenAV release sys¬ 
tem the developer writes the software. During 
this stage they have the option to publicly con- 
suite the community about the project if they 
so wish. 


On completion of the features for a 1.0 re¬ 
lease, testing is performed to verify the soft¬ 
ware is stable. When testing OpenAV software 
a group of trusted users are provided with the 
source code, and requested to not re-share the 
code. They can then use the software, and re¬ 
port bugs that were encountered. 

When testing of the code has completed, the 
project is announced. 

3.2.2 Announcement 

In the announcement the developer demon¬ 
strates the software, what its purpose is, and 
what its features are. A good announcement 
makes it obvious to the audience of readers how 
they would benefit from the available of the 
project. 

The announcement of the project includes 
two important factors for the release: the tar¬ 
get amount and the waiting time. The target 
amount represents the amount of financial sup¬ 
port the developer wishes to receive in return for 
creating the software. The waiting time is the 
amount of time that must pass before a release 
is made if no financial support is recieved. 

The waiting time starts counting down from 
the date the announcement is made, and finan¬ 
cial support in the form of donations is wel¬ 
comed also from this date. 

3.2.3 Releasing 

The project is released when one of three situ¬ 
ations occurs. These three situations are sum¬ 
marized, and then explained: 

• The target amount of financial support is 
reached 

• The waiting time expires, without financial 
support 

• A combination of financial support and 
waiting time passing, as shown by: 
Financial contributions + Waiting Time = 
Target Amount. 

Financial Support Target Reached 

The target amount of money is reached by 
financial contributions. The developer has 
recieved the amount of financial support that 
they requested for an immediate release. 

Waiting Time Expires 

The waiting time for the project has passed: 
the project release is made without any fi¬ 
nancial contribution. The developer does not 
receive any financial support for their efforts. 
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Combination of Finance and Time 

The target amount has been reached, partially 
by financial support, and partially by time 
passing. The developer has recieved some 
financial reward for their effort, and some time 
passed, adding up to the target amount. 

Releasing 

When any of the above three situations occur, 
the developer releases the source-code online 
allowing access to all. 

4 Parties involved 

This section details the point-of-view of the var¬ 
ious parties involved in the OpenAV release sys¬ 
tem. Each party has specific positive and nega¬ 
tive aspects with regards to their relation to the 
release system. 

4.1 The Developer 

When a developer uses the OpenAV release 
model, they create the environment for the pro¬ 
duction of software, both financially and for the 
code. 

The software has to be written without fi¬ 
nancial support, as only after the projects an¬ 
nouncement do they receive any financial sup¬ 
port from it. 

An announcement must be prepared, which 
shows off the features of the program. This is 
generally not necessary when releasing code, so 
could be considered extra work that the devel¬ 
oper must do. Demonstrative videos or blog- 
style posts have been used by OpenAV Produc¬ 
tions to publicize the software’s functionality. 

The previously prepared content must be 
broadcast to as large a user-base as possi¬ 
ble: this involves using social-media extensively, 
writing emails to mailing lists, and posting on 
fora. 

After completion of the project, the demon¬ 
strative content, and announcing it the 
developer waits for financial contributions. 
If contributions arrive, the release clock is 
updated, otherwise the waiting time is reduced 
according to the time passed. 

4.2 The Contributor 

When the OpenAV release system is in use, cer¬ 
tain members of the community may decide to 
financially support the project. The donation 
accelerates the release of the project, but does 
not have any immediate result for the contrib¬ 
utor. 


As a return to the contributor the developer 
could list the contributors name, IRC nick or 
online handle on the project page to show their 
appreciation. OpenAV Productions lists con¬ 
tributors only after receiving a positive answer 
to the contributor being comfortable with such, 
and indicating their preferred name to be pub¬ 
licized. This is in order to maintain absolute 
privacy for contributors if they wish to remain 
anonymous. 

4.3 The Library Developer 

The authors of libraries that the project being 
released is based on make up this group of peo¬ 
ple. Although perhaps not directly involved in 
the OpenAV release model, the author feels it 
worth mentioning the library developers as an 
involved party as their code is in use by a project 
that is being financially supported by the com¬ 
munity. 

The project developer has the choice to do¬ 
nate some of the financial contribution they re¬ 
cieved to the library developer, however they 
are under no obligation to do so. 

There is the possibility that a library devel¬ 
oper doesn’t agree with the release model which 
is being used by the project. Assuming that the 
license of code in question was not violated, one 
could say that it is irrelevant if the library devel¬ 
oper doesn’t agree with the release model: the 
license they chose is adhered to. 

However, the fact that money is exchanged, 
and the library developers might not personally 
agree with the funding model is worth noting 
here. 

4.4 The Remaining Community 

The final “catch-all” group contains the commu¬ 
nity members who are not directly involved in 
the creation or funding of the project. Upon the 
release of the project, they gain source-access to 
the project too. 

The fact that the whole community benefit 
from certain members financially supporting the 
developer is in the authors opinion the ultimate 
success of the OpenAV release system. 

4.5 Statistics 

This section introduces the statistics of the fi¬ 
nances that OpenAV has recieved while working 
with the release system. 

The data presented in table 1 shows details on 
the projects released by OpenAV Productions 
at time of writing. The columns show project 
title, hours spent developing the project, target 
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funding amount, waiting time, and number of 
days before the project was funded. 


Project 

Time 

Target 

Wait 

Days 

Sorcer 

90 

€120 

1 year 

9 

Fabla 

110 

€120 

1 year 

8 

ArtyFX 

120 

€120 

1 year 

5 

Luppp 

480 

€520 

1 year 

5 

ArtyFX 1.1 

70 

€120 

1 year 

8 

Total 

870 

€1000 

- 

- 


Table 1: Details of projects released by OpenAV 
Productions at time of writing. 


4.6 Time 

While developing the OpenAV projects the au¬ 
thor has kept time spent developing. This was 
done using time tracker software, which pro¬ 
vides breakdowns of time spent, and total hours. 

Since the release of Sorcer in May, the total 
amount of time spent developing code for Ope¬ 
nAV productions is approx. 870 hours. 

This figure does not include the development 
of Sorcer or Fabla (since they were started be¬ 
fore Sorcer’s release), however it does include 
work on some currently un-announced projects. 

5 Discussion 

This section discusses different aspects of the 
OpenAV release system. Various points-of-view 
are discussed with regards to the unique fea¬ 
tures of the OpenAV release system, compared 
to the other release systems presented in the 
background section. 

5.1 Trust and Reputation 

This section discusses the topic of trust between 
the developer and the community that applies 
to each funding model. 

5.1.1 Code Quality 

The OpenAV release model requires a basis of 
trust between the community and the devel¬ 
oper. This trust in the developer takes its form 
as members of the community who financially 
contribute to the project believe that the qual¬ 
ity and stability of the program is worth fund¬ 
ing. 

This trust can be built up over time by each 
developer by making smaller contributions, or 
releasing some code to prove their capabilities. 


5.1.2 Target Funding 

On announcement, the developer defines the 
target amount of financial support when using 
the OpenAV release system. 

Contributors must make a decision when sup¬ 
porting a piece of software if they think the de¬ 
velopers efforts are worth the finances they’re 
asking for. This decision involves the contrib¬ 
utors trust in the developers estimate of price, 
as well as their personal evaluation of the desir¬ 
ability of the resulting project. 

5.1.3 Waiting Time and Funding 

Upon announcing a project using the OpenAV 
release system, the developer must choose a 
waiting time before the project is released: even 
if no funding is recieved. 

This waiting time is the tradeoff for financial 
contributions: a good balance between waiting 
time and target amount will motivate people to 
contribute to the project, because their contri¬ 
bution makes a significant improvement to the 
release date. 

5.2 Release System Comparisons 

This section discusses how the OpenAV release 
system compares with other release systems 
as presented in the background section: Kick- 
starter, Openlnitiative and Snowdrift. 

5.2.1 Motivation for development 

Kickstarter can be used to gain financial capital 
for commercial and closed source profit. It does 
not imply that the resulting software / project 
is released as open source. 

On the contrary, the OpenAV release system 
incorporates a promise from the creator that the 
result of their work will be shared as open source 
regardless of the amount of funding that they 
may receive. 

This fundamental difference between the 
funding motivation is one which is interesting 
to consider when discussing funding models for 
open source software. 

5.2.2 Financial support 

There are a variety of different choices to consid¬ 
ering as to when a developer receives funding. 

Kickstarter uses a “propose-fund-work” sys¬ 
tem which means that at worst the developer 
only makes a proposal, and if its not funded 
doesn’t have to do any more work. Openlni¬ 
tiative breaks this down into smaller stages, for 
a more finely-grained “propose-fund-work” sys¬ 
tem. 
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Snowdrift takes a totally different angle, and 
supports the developer financially without them 
having to make a commitment to a certain fea¬ 
ture to develop. 

The OpenAV release system takes a novel ap¬ 
proach, which involves the developer doing all 
the work and then hoping to be financially sup¬ 
ported for the time spent on that feature. 

By using the OpenAV release system, a de¬ 
veloper must be aware that they must do the 
initial development of the program without fi¬ 
nancial support. 

5.2.3 Outcome of project 

This section discusses the outcome of each 
project, based on the funding method used. 

When funded with Kickstarter and Openlni- 
tiative, if a proposal doesn’t get funded then 
the work isn’t completed. This means that the 
developer doesn’t have to spend their own free 
time completing the work, but also that the 
community doesn’t benefit from the work done. 

Using the OpenAV release system, the devel¬ 
oper takes on the risk of doing the work, and 
hoping to be financially supported later. In this 
way, the release model is more demanding for 
the developer, and less demanding of the com¬ 
munity. 

A positive aspect of the OpenAV release 
model is that the community can see the work 
done, and if they value it, they can contribute to 
the project in order to have it released sooner. 

5.3 Financial Viability 

This section deals with the financial viability of 
doing full time development of software using 
the OpenAV release system. 

As presented in section 4 .5 Statistics of this 
paper , table 1 shows each project, the approxi¬ 
mate amount of time spent on the project, and 
the amount of financial support recieved for the 
project. 

Each project was released with 100% funding. 
This shows that the community are willing to 
financially support developers using this release 
model. 

The hourly rate of pay is about €1.15. In 
order to make a living from releasing software 
by OpenAV, the target amounts would need to 
increase at least tenfold. 

A tenfold increase in support would set the 
hourly rate at approx € 12, which if worked for 
40 hour weeks, 40 weeks a year, would result in 
a gross wage of € 20,000. 


The author feels that it is possible to achieve 
enough financial support to work full time on 
open source software using this release system. 

6 Conclusion 

This paper has presented the OpenAV release 
system, a new funding and release model that 
is geared towards small open source software 
projects. 

A detailed procedure of how the OpenAV re¬ 
lease system works is given. It was then dis¬ 
cussed with regards to other funding and release 
systems, including Kickstarter, Openlnitiative 
and Snowdrift. 

In the financial viability section the author 
draws from the experience gained from using 
the OpenAV release system for four software 
projects. 

The author intends to continue using the 
OpenAV release model, perhaps one day be¬ 
ing supported enough to work full time on open 
source projects. 
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Abstract 

This paper reports on two approaches to provide a 
general-purpose audio programming support for web 
applications based on Csound. It reviews the cur¬ 
rent state of web audio development, and discusses 
some previous attempts at this. We then introduce 
a Javascript version of Csound that has been crea¬ 
ted using the Emscripten compiler, and discuss its 
features and limitations. In complement to this, we 
look at a Native Client implementation of Csound, 
which is a fully-functional version of Csound running 
in Chrome and Chromium browsers. 

Keywords 

Music Programming Languages; Web Applications; 

1 Introduction 

The web browser has become an increasingly 
viable platform for the creation and distributi¬ 
on of various types of media computing appli- 
cations[Wyse and Subramanian, 2013]. It is no 
surprise that audio is an important part of these 
developments. For a good while now we have be¬ 
en interested in the possibilities of deployment 
of client-side Csound-based applications, in ad¬ 
dition to the already existing server-side capa¬ 
bilities of the system. Such scenarios would be 
ideal for various uses of Csound. For instance, 
in Education, we could see the easy deployment 
of Computer Music training software for all le¬ 
vels, from secondary schools to third-level in¬ 
stitutions. For the researcher, web applications 
can provide an easy means of creating proto¬ 
types and demonstrations. Composers and me¬ 
dia artists can also benefit from the wide reach 
of the internet to create portable works of art. 
In summary, given the right conditions, Csound 
can provide a solid and robust general-purpose 
audio development environment for a variety of 
uses. In this paper, we report on the progress 
towards supporting these conditions. 


2 Audio Technologies for the Web 

The current state of audio systems for world¬ 
wide web applications is primarily based upon 
three technologies: Java 1 , Adobe Flash , and 
HTML5 Web Audio 3 . Of the three, Java is the 
oldest. Applications using Java are deployed via 
the web either as Applets 1 or via Java Web 
Start ’. Java as a platform for web applications 
has lost popularity since its introduction, pri¬ 
marily due to historically sluggish start-up ti¬ 
mes as well as concerns over security breaches. 
Also of concern is that major browser vendors 
have either completely disabled Applet loading 
or disabled them by default, and that NPAPI 
plugin support, with which the Java plugin for 
browsers is implemented, is planned to be drop¬ 
ped in future browser versions 1 ’. While Java sees 
strong support on the server-side and desktop, 
its future as a web-deployed application is te¬ 
nuous at best and difficult to recommend for 
future audio system development. 

Adobe Flash as a platform has seen large- 
scale support across platforms and across brow¬ 
sers. Numerous large-scale applications have be¬ 
en developed such as AudioTool , Patchwork'', 
and Noteflight . Flash developers can choose to 
deploy to the web using the Flash plugin, as 
well as use Adobe Air to deploy to desktop 
and mobile devices. While these applications de¬ 
monstrate what can be developed for the web 

Tttp : //j ava. oracle . com 

“http://www.adobe.com/products/flashruntimes. 
html 

4 http://www.w3.org/TR/webaudio/ 

4 http://docs.oracle.com/javase/tutorial/ 
deployment/applet/index.html 

’http://docs.oracle.com/javase/tutorial/ 
deployment/webstart/index.html 

f ’http: //blog, chromium, org/2013/09/ 
saying-goodbye-to-our-old-friend-npapi.html 
7 http://www.audiotool.com/ 

8 http://www.patchwork-synth.com 
!, http : //www.noteflight. com 
10 http://www.adobe.com/products/air.html 
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using Flash, the Flash platform itself has a 
number of drawbacks. The primary tools for 
Flash development are closed-source, commer¬ 
cial applications that are unavailable on Linux, 
though open source Flash compilers and IDEs 
do exist !: . There has been a backlash against 
Flash in browsers, most famously by Steve Jobs 
and Apple 12 , and the technology stack as a who¬ 
le has seen limited development with the gro¬ 
wing popularity of HTML5. At this time, Flash 
may be a viable platform for building audio ap¬ 
plications, but the uncertain future makes it dif¬ 
ficult to recommend. 

Finally, HTML5 Web Audio is the most re¬ 
cent of technologies for web audio applications. 
Examples include the “Recreating the sounds of 
the BBC Radiophonic Workshop using the Web 
Audio API” site 13 , Gibberish 14 , and WebPd °. 
Unlike Java or Flash, which are implemented 
as browser plug-ins, the Web Audio API is a 
W3C proposed standard that is implemented by 
the browser itself. Having built-in support for 
Audio removes the security issues and concerns 
over the future of plug-ins that affect Java and 
Flash. However, the Web Audio API has limita¬ 
tions that will be explored further below in the 
section on Emscripten. 

3 Csound-based Web Application 
Design 

Csound is a music synthesis system that has 
roots in the very earliest history of computer 
music. Csound use in Desktop and Mobile app¬ 
lications has been discussed previously in [Laz- 
zarini et ah, 2012b], [Yi and Lazzarini, 2012], 
and [Lazzarini et ah, 2012a]. 

Prior to the technologies presented this pa¬ 
per, Csound-based web applications have em¬ 
ployed Csound mostly on the server-side. For 
example, NetCsound 1 ' allows sending a CSD 
file to the server, where it would render the 
project to disk and email the user a link to 
the rendered hie when complete. Another use of 

n http: //www. f lashdevelop. org/ 

"http://www.apple.com/hotnews/ 
thoughts-on-flash/ 

14 http: //webaudio . prototyping. bbc. co. uk/ 

14 Available at https://github.com/ 

charlieroberts/Gibberish, discussed in [Roberts 
et al., 2013] 

15 https://github.com/sebpiq/WebPd 
16 http://caniuse. com/audio-api lists current brow¬ 
sers that support the Web Audio API 

1( Available at http://dream.cs.bath.ac.uk/ 
netcsound/, discussed in [flitch et al., 2007] 


Csound on the server is Oeyvind Brandtsegg’s 
VLBI Music 18 , where Csound is running on the 
server and publishes its audio output to an au¬ 
dio stream that end users can listen to. A simi¬ 
lar architecture is found in [Johannes and To- 
shihiro, 2013]. Since version 6.02, Csound also 
includes a built-in server, that can be activa¬ 
ted through an option on start up. The server 
is able to receive code directly through UDP 
connections and compile them on the fly. 

Using Csound server-side has both positives 
and negatives that should be evaluated for a 
project’s requirements. It can be appropriate to 
use if the project’s design calls for a single audio 
stream/Csound instance that is shared by all 
listeners. In this case, users might interact with 
the audio system over the web, at the expen¬ 
se of network latency. Using multiple realtime 
Csound instances, as would be the case if there 
was one per user, would certainly be taxing for 
a single server and would require careful resour¬ 
ce limiting. For multiple non-realtime Csound 
instances, as in the case of NetCsound, multi¬ 
ple jobs may be scheduled and batch processed 
with less problems than with realtime systems, 
though resource management is still a concern. 

An early project to employ client-side audio 
computation by Csound was described in [Casey 
and Smaragdis, 1996], where a sound and music 
description system was proposed for the rende¬ 
ring of network-supplied data streams. A possi¬ 
bly more flexible way to use Csound in client- 
side applications, however, is to use the web 
browser as a platform. Two attempts at this ha¬ 
ve been made in the past. The first was the now- 
defunct ActiveX Csound (also known as AXC- 
sound) 1 f , which allowed embedding Csound into 
a webpage as an ActiveX Object. This technolo¬ 
gy is no longer maintained and was only availa¬ 
ble for use on Windows with Internet Explo¬ 
rer. A second attempt was made in the Mobile 
Csound Project [Lazzarini et al., 2012b], where a 
proof-of-concept Csound-based application was 
developed with Java and deployed using Java 
Web Start, achieving client-side Csound use via 
the browser. However, the technology required 
special permissions to run on the client side and 
required Java to be installed. Due to those issu¬ 
es and the unsure future of Java over the web, 


18 http://www.researchcatalogue.net/view/55360/ 
55361 

19 We were unable to find a copy of this online, but one 
is available from the CD-ROM included with [Boulanger, 
2000 ] 
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the solution was not further explored. 

The two systems described in this paper are 
browser-based solutions that run on the client- 
side. The both share the following benefits: 

• Csound has a large array of signal proces¬ 
sing opcodes made immediately available 
to web-based projects. 

• They are compiled using the same source 
code as is used for the desktop and mo¬ 
bile version of Csound. They only require 
recompiling to keep them in sync with the 
latest Csound features and bug fixes. 

• Csound code that can be run with the¬ 
se browser solutions can be used on other 
platforms. Audio systems developed using 
Csound code is then cross-platform across 
the web, desktop, mobile, and embed¬ 
ded systems (i.e. Raspberry Pi, Beaglebo- 
ne; discussed in [Batchelor and Wignall, 
2013]). Developers can reuse their audio co¬ 
de from their web-based projects elsewhere, 
and vice versa. 

4 Emscripten 

Emscripten is a a project created by Alon Za- 
kai at the Mozilla Foundation that compiles the 
assembly language used by the LLVM compi¬ 
ler into Javascript [Zakai, 2011]. When used in 
combination with LLVM’s Clang frontend, Em¬ 
scripten allows applications written in C/C-l—I- 
or languages that use C/C++ runtimes to be 
run directly in web browsers. This eliminates 
the need for browser plugins and takes full ad¬ 
vantage of web standards that are already in 
common use. 

In order to generate Javascript from C/C++ 
sourcecode the codebase is first compiled into 
LLVM assembly language using LLVM’s Clang 
frontend. Emscripten translates the resulting 
LLVM assembly language into Javascript, speci¬ 
fically an optimised subset of Javascript entitled 
asm.js. The asrn.js subset of Javascript is inten¬ 
ded as a low-level target language for compilers 
and allows a number of optimisations which are 
not possible with standard Javascript-". Code 
semantics which differ between Javascript and 
LLVM assembly can be emulated when accu¬ 
rate code is required. Emscripten has built-in 
methods to check for arithmetic overflow, si¬ 
gning issues and rounding errors. If emulation 
is not required, code can be translated without 

2() http: //asmj s . org/spec/latest/ 


semantic emulation in order to achieve the best 
execution performance [Zakai, 2011]. 

Implementations of the C and C++ runti¬ 
me libraries have been created for applicati¬ 
ons compiled with Emscripten. These allow pro¬ 
grams written in C/C++ to transparently per¬ 
form common tasks such as using the file sys¬ 
tem, allocating memory and printing to the con¬ 
sole. Emscripten allows a virtual filesystem to 
be created using its FS library, which is used 
by Emscripten’s libc and libcxx for file I/O . 
Files can be added or removed from the virtual 
filesystem using Javascript helper functions. It 
is also possible to directly call C functions from 
Javascript using Emscripten . These functions 
must first be named at compile time so they 
are not optimised out of the resulting compi¬ 
led Javascript code. The required functions are 
then wrapped using Emscripten’s cwrap functi¬ 
on, and assigned to a Javascript function name. 
The cwrap function allows many Javascript va¬ 
riables to be used transparently as arguments to 
C functions, such as passing Javascript strings 
to functions which require the C languages const 
char array type. 

Although Emscripten can successfully compi¬ 
le a large section of C/C++ code there are still 
a number of limitations to this approach due to 
limitations within the Javascript language and 
runtime. As Javascript doesn’t support threa¬ 
ding, Emscripten is unable to compile codeba¬ 
ses that make use of threads. Some concurrency 
is possible using web workers, but they do not 
share state. It is also not possible to directly im¬ 
plement 64-bit integers in Javascript as all num¬ 
bers are represented using 64-bit doubles. This 
results in a risk of rounding errors being intro¬ 
duced to the compiled Javascript when perfor¬ 
ming arithmetic operations with 64-bit integers 
[Zakai, 2011]. 

4.1 CsoundEmscripten 

CsoundEmscripten is an implementation of the 
Csound language in Javascript using the Ems¬ 
cripten compiler. A working example of Csoun¬ 
dEmscripten can be found at http: //eddyc. 
github. io/CsoundEmscripten/. The compiled 
Csound library and CsoundObj Javascript class 
can be found at https://github.com/eddyc/ 
CsoundEmscripten/. CsoundEmscripten con- 

21 https://github.com/kripken/emscripten/wiki/ 
Filesystem-API 

"https://github.com/kripken/emscripten/wiki/ 
Interacting-with-code 
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sists of three main modules: 

• The Csound library compiled to Javascript 
using Emscripten. 

• A structure and associated functions writ¬ 
ten in C named CsoundObj implemented 
on top of the Csound library that is com¬ 
piled to Javascript using Emscripten. 

• A handwritten Javascript class also named 
CsoundObj that contains the public in¬ 
terface to CsoundEmscripten. The Javas¬ 
cript class both wraps the compiled Cso¬ 
undObj structure and associated functions, 
and connects the Csound library to the 
Web Audio API. 

4.1.1 Wrapping the Csound C API for 
use with Javascript 

In order to simplify the interface between the 
Csound C API and the Javascript class contai¬ 
ning the CsoundEmscripten public interface, a 
structure named CsoundObj and a number of 
functions which use this structure were created. 
The structure contains a reference to the cur¬ 
rent instance of Csound, a reference to Csound’s 
input and output buffer, and Csound’s OdBFS 
value. Some of the functions that use this struc¬ 
ture are: 

• CsoundObj _new() - This function alloca¬ 
tes and returns an instance of the Csound¬ 
Obj structure. It also initialises an instan¬ 
ce of Csound and disables Csound’s default 
handling of sound I/O, allowing Csound’s 
input and output buffers to be used direct¬ 
ly- 

• CsoundObj_compileCSD(self, 
filePath, samplerate, controlrate, 
buffer size) - This function is used 
to compile CSD files, it takes as its 
arguments: a pointer to the CsoundObj 
structure self, the address of a CSD file 
given by filePath, a specified sample rate 
given by samplerate, a specified control 
rate given by controlrate and a buffer 
size given by buffersize. The CSD file at 
the given address is compiled using these 
arguments. 

• CsoundObj_process(self, 
inNumberFrames, inputBuffer, 
outputBuf f er) - This function copies 
audio samples to Csound’s input buffer 
and copies samples from Csound’s output 


buffer. It takes as its arguments: a pointer 
to the CsoundObj structure self, an integer 
inNumberFrames specifying the number 
of samples to be copied, a pointer to a 
buffer containing the input samples named 
inputBuffer and a pointer to a destination 
buffer to copy the output samples named 
outputBuffer. 

Each of the other functions that use the Cso¬ 
undObj structure simply wrap existing functi¬ 
ons present in the Csound C API. The relevant 
functions are: 

• csoundGetKsmps(csound) - This function 
takes as its argument a pointer to an in¬ 
stance of Csound and returns the number 
of specified audio frames per control sam¬ 
ple. 

• csoundGetNchnls(csound) - This functi¬ 
on takes as its argument a pointer to an 
instance of Csound and returns the num¬ 
ber of specified audio output channels. 

• csoundGetNchnlsInput(csound) - This 
function takes as its argument a pointer 
to an instance of Csound and returns the 
number of specified audio input channels. 

• csoundStop(csound) - This function takes 
as its argument a pointer to an instance 
of Csound stops the current performance 
pass. 

• csoundReset (csound) - This function ta¬ 
kes as its argument a pointer to an instance 
of Csound and resets its internal memory 
and state in preparation for a new perfor¬ 
mance. 

• csoundSetControlChannel(csound, 
name, val) - This function takes as its 
arguments: a pointer to an instance of 
Csound, a string given by name, and 
number given by val, it sets the numerical 
value of a Csound control channel specified 
by the string name. 

The CsoundObj structure and associated 
functions are compiled to Javascript using Em¬ 
scripten and added to the compiled Csound Ja¬ 
vascript library. Although this is not necessary, 
keeping the compiled CsoundObj structure and 
functions in the same file as the Csound library 
makes it more convenient when including Cso¬ 
undEmscripten within web pages. 
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4.1.2 The CsoundEmscripten 
Javascript interface 

The last component of CsoundEmscripten is the 
CsoundObj Javascript class. This class provi¬ 
des the public interface for interacting with the 
compiled Csound library. As well as allocating 
an instance of Csound this class provides me¬ 
thods for controlling performance and setting 
the values of Csound’s control channels. Addi¬ 
tionally, this class interfaces with the Web Au¬ 
dio API, providing Csound with samples from 
the audio input bus and copying samples from 
Csound to the audio output bus. Audio I/O 
and the Csound process are performed in Javas¬ 
cript using the Web Audio API’s ScriptProces- 
sorNode. This node allows direct access to input 
and output samples in Javascript allowing au¬ 
dio processing and synthesis using the Csound 
library. 

Csound can be used in any webpage by crea¬ 
ting an instance of CsoundObj and calling the 
available public methods in Javascript. The me¬ 
thods available in the CsoundObj class are: 

• compileCSD (fileName) This method ta¬ 
kes as its argument the address of a CSD 
file fileName and compiles it for perfor¬ 
mance. The CSD file must be present in 
Emscripten’s virtual filesystem. This me¬ 
thod calls the compiled C function Csoun¬ 
dObj _compileCSD. It also creates a Script- 
ProcessorNode instance for Audio I/O. 

• enableAudioInput () This method enables 
audio input to the web browser. When cal¬ 
led, it triggers a permissions dialogue in the 
host web browser requesting permission to 
allow audio input. If permission is gran¬ 
ted, audio input is available for the running 
Csound instance. 

• startAudioCallbackO This method 
connects the ScriptProcessorNode to the 
audio output and, if required, the audio 
input. The ScriptProcessorNodes audio 
processing callback is also started. During 
each callback, if required, audio samples 
from the ScriptProcessorNodes input are 
copied into Csound’s input buffer and any 
new values for Csound’s software channels 
are set. Csound’s csoundPerformKsmps() 
function is called and any output samples 
are copied into the ScriptProcessorNodes 
output buffer. 

• stopAudioCallbackO This method dis¬ 
connects the current running ScriptPro¬ 


cessorNode and stops the audio process 
callback. If required this method also dis¬ 
connects any audio inputs. 

• addControlChannel(name, 
initialValue) This method adds an 
object to a Javascript array that is used 
to update Csound’s named channel values. 

Each object contains a string value given 
by name, a float value given by initialValue 
and additionally a boolean value indicating 
whether the float value has been updated. 

• setControlChannelValue(name, value) 

This method sets a named control channel 
given by the string name to the specified 
number given by the value argument. 

• getControlChannelValue(name) This 
method returns the current value of a 
named control channel given by the string 
name. 

4.1.3 Limitations 

Using CsoundEmscripten, it is possible to add 
Csound’s audio processing and synthesis capa¬ 
bilities to any web browser that supports the 
Web Audio API. Unfortunately this approach 
of bringing Csound to the web comes with a 
number of drawbacks. 

Although Javascript engines are constant¬ 
ly improving in speed and efficiency, running 
Csound entirely in Javascript is a processor in¬ 
tensive task on modern systems. This is especi¬ 
ally troublesome when trying to run even mode¬ 
rately complex CSD hies on mobile computing 
devices. 

Another limitation is due to the design of 
the ScriptProcessorNode part of the Web Au¬ 
dio API. Unfortunately, the ScriptProcessorNo¬ 
de runs on the main thread. This can result 
in audio glitching when another process on the 
main thread—such as the UI—causes a delay in 
audio processing. As part of the W3Cs Web Au¬ 
dio Spec review it has been suggested that the 
ScriptProcessorNode be moved off of the main 
thread 23 . There has also been a resolution by 
the Web Audio API developers that they will 
make it possible to use the ScriptProcessorNo¬ 
de with web workers 2 . Hopefully in a future 
version of the Web Audio API the ScriptPro¬ 
cessorNode will be more capable of running the 

23 https://github.com/w3ctag/ 
spec-reviews/blob/master/2013/07/WebAudio. 

md#issue-scriptprocessornode-is-unfit-for-purpose-section-1 

“ 4 https://www.w3.org/Bugs/Public/show_bug.cgi? 
id=17415#c94 
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kind complex audio processing and synthesis ca¬ 
pabilities allowed by the Csound library. 

This version of Csound also does not support 
plugins, making some opcodes unavailable. Ad¬ 
ditionally, MIDI I/O is not currently supported. 
This is not due to the technical limitations of 
Emscripten, rather it was not implemented due 
to the current lack of support for the WebMIDI 
standard in Mozilla Firefox" and in the Webkit 
library 21 ’. 

5 Beyond Web Audio: Creating 
Audio Applications with PNaCl 

As an alternative to the development of audio 
applications for web deployment in pure Javas¬ 
cript, it is possible to take advantage of the Na¬ 
tive Clients (NaCl) platform . This allows the 
use of C and C++ code to create components 
that are accessible to client-side Javascript, and 
run natively inside the browser. NaCl is descri¬ 
bed as a sandboxing technology, as it provides a 
safe environment for code to be executed, in an 
OS-independent manner [Yee et al., 2009] [Sehr 
et ah, 2010]. This is not completely unlike the 
use of Java with the Java Webstart Technology 
(JAWS), which has been discussed elsewhere in 
relation to Csound [Lazzarini et ah, 2012b], 

There are two basic toolchains in NaCl: nati- 
ve/gcc and PNaCl [Donovan et ah, 2010]. Whi¬ 
le the former produces architecture-dependent 
code (arm, x86, etc.), the latter is completely 
independent of any existing architecture. NaCl 
is currently only supported by the Chrome and 
Chromium browsers. Since version 31, Chrome 
enables PNaCl by default, allowing applications 
created with that technology to work complete¬ 
ly out-of-the-box. While PNaCl modules can be 
served from anywhere in the open web, native- 
toolchain NaCl applications and extensions can 
only be installed from Google’s Chrome Web 
Store. 

5.1 The Pepper Plugin API 

An integral part of NaCl is the Pepper Plu¬ 
gin API (PPAPI, or just Pepper). It offers va¬ 
rious services, of which interfacing with Javas¬ 
cript and accessing the audio device is particu¬ 
larly relevant to our ends. All of the toolchains 
also include support for parts of the standard 
C library (eg. stdio), and very importantly for 

"’ ’https: //bugzilla.mozilla. org/show_bug. cgi? 
id=836897 

26 https://bugs.webkit.org/show_bug.cgi?id= 
107250 

"’’https: //developers . google . com/native-client 


Csound, the pthread library. However, absent 
from the PNaCl toolchain are dlopenQ and fri¬ 
ends, which means no dynamic loading is availa¬ 
ble there. 

Javascript client-side code is responsible for 
requesting the loading of a NaCl module. On¬ 
ce the module is loaded, execution is controlled 
through Javascript event listeners and messa¬ 
ge passing. A postMessage() method is used by 
Pepper to allow communication from Javascript 
to PNaCl module, triggering a message handler 
in the C/C++ side. In the opposite direction, a 
message event is issued when C/C++ code calls 
the equivalent PostMessage() function. 

Audio output is well supported in Pepper 
with a mid-latency callback mechanism (ca. 10- 
11ms, 512 frames at 44.1 or 48 KHz sampling 
rate). Its performance appears to be very uni¬ 
form across the various platforms. The Audio 
API design is very straightforward, although the 
library is a little rigid in terms of parameters. It 
supports only stereo at one of the two sampling 
rates mentioned above). Audio input is not yet 
available in the production release, but support 
can already be seen in the development reposi¬ 
tory. 

The most complex part of NaCl is access to 
the local files. In short, there is no open access 
to the client disk, only to sandboxed filesys¬ 
tems. It is possible to mount a server filesystem 
(through httpfs), a memory filesystem (rnernfs), 
as well as local temporary or permanent file¬ 
systems (html5fs). For those to be useful, they 
can only be mounted and accessed through the 
NaCl module, which means that any copying 
of data from the user disk into these partitions 
has to be mediated by code written in the NaCl 
module. For instance, it is possible to take ad¬ 
vantage of the file HTML5 tag and to get data 
from NaCl into a Javascript blob so that it can 
be saved into the user’s disk. It is also possible 
to copy a file from disk into the sandbox using 
the URLReader service supplied by Pepper. 

5.2 PNaCl 

The PNaCl toolchain compiles code down to 
a portable bitcode executable (called a pexe). 
When this is delivered to the browser, an ahead- 
of-time compiler is used to translate the code in¬ 
to native form. A web application using PNaCl 
will contain three basic components: the pexe 
binary, a manifest file describing it, and a client- 
side script in JS, which loads and allows interac¬ 
tion with the module via the Pepper messaging 
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system. 

5.3 Csound for PNaCl 

A fully functional implementation of Csound for 
Portable Native Clients is available from http: 
//vlazzarini . github. io. The package is com¬ 
posed of three elements: the Javascript modu¬ 
le (csound.js), the manifest file (csound.nmf), 
and the pexe binary (csound.pexe). The sour¬ 
ce for the PNaCl component is also available 
from that site (csound.cpp). It depends on the 
Csound and Libsndfile libraries compiled for 
PNaCl and the NaCL sdk. A Makefile for PNaCl 
exists in the Csound 6 sources. 

5.3.1 The Javascript interface 

Users of Csound for PNaCl will only inter¬ 
act with the services offered by the Javascript 
module. Typically an application written in 
HTML5 will require the following elements to 
use it: 

• the csound.js script 

• a reference to the module using a div tag 
with id= “engine” 

• a script containing the code to control 
Csound. 

The script will contain calls to methods in 
csound.js, such as: 

• csound.Play() - starts performance 

• csound.PlayCsd(s) - starts performance 
from a CSD file s, which can be in ./http/ 
(ORIGIN server) or ./local/ (local sand¬ 
box). 

• csound.RenderCsd(s) - renders a CSD file 
s, which can be in ./http/ (ORIGIN server) 
or ./local/ (local sandbox), with no RT au¬ 
dio output. The “finished render” message 
is issued on completion. 

• csound.Pause() - pauses performance 

• csound. CompileOrc(s) - compiles the 
Csound code in the string s 

• csound.ReadScore(s) - reads the score in 
the string s (with preprocessing support) 

• csound.Event (s) - sends in the line events 
contained in the string s (no preprocessing) 

• csound.SetChannel(name, value) 

sends the control channel name the value 
value , both arguments being strings. 


As it starts, the PNaCl module will call a 
moduleDidLoadO function, if it exists. This can 
be defined in the application script. Also the fol¬ 
lowing callbacks are also definable: 

• function handleMessage(message): cal¬ 
led when there are messages from Csound 
(pnacl module). The string message.data 
contains the message. 

• function attachListeners(): this is cal¬ 
led when listeners for different events are 
to be attached. 

In addition to Csound-specific controls, the 
module also includes a number of filesystem fa¬ 
cilities, to allow the manipulation of resources 
in the server and in the sandbox: 

• csound.CopyToLocal(src, dest) - copies 
the file src in the ORIGIN directory to the 
local file dest , which can be accessed at ./lo¬ 
cal/ dest. The “Complete” message is issued 
on completion. 

• csound.CopyUrlToLocal(url,dest) - co¬ 
pies the url url to the local file dest , which 
can be accessed at ./local /dest. Current¬ 
ly only ORIGIN and CORS urls are allo¬ 
wed remotely, but local files can also be 
passed if encoded as urls with the web- 
kitURL.createObjectURLQ javascript me¬ 
thod. The “Complete” message is issued on 
completion. 

• csound.RequestFileFromLocal(src) 

requests the data from the local file src. 
The “Complete” message is issued on 
completion. 

• csound. GetFileDataO - returns the most 
recently requested file data as an ArrayOb- 
ject. 

A series of examples demonstrating this API 
is provided in github. In particular, an introduc¬ 
tory example is found on http: //vlazzarini. 
github.io/minimal.html. 

5.3.2 Limitations 

The following limitations apply to the current 
release of Csound for PNaCl: 

• no realtime audio input (not supported yet 
in Pepper/NaCl) 

• no MIDI in the NaCl module. However, it 
might be possible to implement MIDI in 
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JavaScript (through WebMIDI), and using 
the csound.js functions, send control da¬ 
ta to Csound, and respond to the various 
channel messages. 

• no plugins, as pNaCl does not support 
dlopenQ and friends. This means some 
Csound opcodes are not available as they 
reside in plugin libraries. It might be possi¬ 
ble to add some of these opcodes statically 
to the Csound pNaCl library in the future. 

6 Conclusions 

In this paper we reviewed the current state of 
support for the development of web-based au¬ 
dio and music applications. As part of this, we 
explored two approaches in deploying Csound 
as an engine for general-purpose media softwa¬ 
re. The first consisted of a Javascript version 
created with the help of the Emscripten com¬ 
piler, and the second a native C/C++ port for 
the Native Client platform, using the Portable 
Native Client toolchain. The first has the advan¬ 
tage of enjoying widespread support by a varie¬ 
ty of browsers, but is not yet fully deployable. 
On the other hand, the second approach, whi¬ 
le at the moment only running on Chrome and 
Chromium browsers, is a robust and ready-for- 
production version of Csound. 
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Abstract 

Subjective listening tests are an essential tool for 
the evaluation and comparison of audio processing 
algorithms. In this paper we introduce BeaqleJS, 
a framework based on HTML5 and JavaScript to 
run listening tests in any modern web browser. This 
allows an easy distribution of the test environment to 
a significant amount of participants in combination 
with simple configuration and good expandability. 

Keywords 

listening test, subjective audio evaluation, HTML5, 
JavaScript 

1 Introduction 

Frequently used physical measures to judge the 
quality of audio signals, like the signal to noise 
ratio or signal distortion, do not correlate well 
with the perception of quality by the human 
hearing system. Therefore, listening tests, also 
named subjective audio evaluations, play an im¬ 
portant role in the comparison of signal process¬ 
ing algorithms like audio effects and codecs. 

The setup of a test environment and the se¬ 
lection of items under test is crucial to yield sig¬ 
nificant and non-biased results. Some guidance 
and standards can be found for example in the 
International Telecommunication Union (ITU) 
recommendations and in particular in [1], Still, 
one of the biggest problems is to address an ade¬ 
quate number of qualified participants. Closely 
connected is the problem of distributing the test 
environment to the various platforms of the par¬ 
ticipants and how the results could be merged 
and evaluated afterwards. 

In this paper BeaqleJS (browser based 
evaluation of audio quality and comparative 
listening environment) is presented, which is a 
framework to easily setup and run listening tests 
in any modern web browser. To achieve this, 
BeaqleJS purely relies on open web standards 
like HTML5 and JavaScript, without the need 
of further browser plugins or extensions. It is 


published under the GPLv3 open source license 
and its source code is available on GitHub 1 . 

The following section 2 will first introduce 
some background information about listening 
tests and common standards in general. After¬ 
wards, the BeaqleJS framework is described in 
section 3 and section 4 outlines advanced usage 
scenarios like modifying or implementing new 
test schemes as well as server side evaluation 
and data collection. Finally section 5 will give 
a conclusion and outlook. 

2 Listening test standards and basics 

The difficulty in setting up a listening test 
comes from the fact that humans are rarely ob¬ 
jective in their judgements. Therefore, the chal¬ 
lenge is to design a test environment that mini¬ 
mizes external influences and yields non-biased 
results. To avoid mistakes it is advised to stick 
close to standardised instructions and test pro¬ 
cedures as they are for example defined by the 
ITU in [1] [2] [3]. 

Test items should be presented in random or¬ 
der together with neutral names avoiding any 
association to the underlying algorithms. If sev¬ 
eral different algorithms are compared in one 
test, the corresponding items should always ap¬ 
pear at a random position to prevent that the 
listeners recognize or learn the connection be¬ 
tween a rating and its item position. 

It is also necessary to find a way to judge the 
ability of the participants to understand the test 
procedure or to even recognize if they are able 
to perceive any differences between the items at 
all. For this purpose, a hidden reference and 
an anchor signal can be mixed among the test 
items. In valid test results, the participants 
should always rate the hidden reference with the 
same quality as the visible reference. In con¬ 
trary, an anchor signal is an obviously bad test 

1 GitHub is a source code hosting platform using the 
git version control system http://www.github.com 
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item, for example heavily lowpass filtered, that 
is expected to always catch the worst rating and 
will set the bottom end of the scale. 

The experience of the participants can have 
a strong influence on the results. People that 
are trained to hear analytically and know what 
artefacts they have to listen for can usually give 
more detailed feedback. On the other hand, 
they might be quite biased in their understand¬ 
ing of audio quality and typical consumers may 
highlight completely different aspects. There¬ 
fore, it makes sense to consider and document 
the background of the participants. 

In a decentralized and distributed test setup 
it is important to assure comparable playback 
conditions. This is best achieved when all par¬ 
ticipants make use of high quality studio head¬ 
phones as these completely reduce the influence 
of room acoustics and can be expected to be 
quite linear over a broad frequency range. 

The selected audio test items should be well 
selected to reflect and underline a variety of 
characteristics of the tested algorithms. This 
can include for example transient, noisy and 
harmonic signals. A good starting point for 
choosing audio is the SQAM (Sound Quality 
Assessment Material) CD from the European 
Broadcasting Union (EBU) [4], which is well 
established in the audio coding held. The indi¬ 
vidual test item should be quite short and not 
exceed a length of 10 seconds. On the one hand, 
this helps to keep the attention of all listeners 
focused to the same part of the item, but also 
to avoid that exhaustion of the participants will 
influence the results. For the same reason, the 
amount of time that is necessary to perform the 
whole test should be kept below 15 minutes. 

3 The BeaqleJS framework 

BeaqleJS provides a framework to create 
browser based listening tests and is purely 
based on open web standards like HTML5 and 
JavaScript. For the user interface and to sim- 
plihy Document Object Model (DOM) manip¬ 
ulations, the well known j Query and j QueryUI 
libraries [5] are used. 

The general structure of BeaqleJS can be di¬ 
vided in three blocks (Fig. 1). There is a com¬ 
mon HTML5 index. html hie to hold the main 
HTML structure with some basic place holder 
blocks whose content will be dynamically cre¬ 
ated by the JavaScript backend. The styling is 
completely independent and done with the help 
of cascading style sheets (CSS). Style sheets, 



Figure 1: Schematic overview of the BeaqleJS 
framework. 

conhg hies and all necessary JavaScript libraries 
are loaded in the header of the index.html. 
Most of the descriptive text, like introduction 
and instructions, are placed in hidden blocks 
inside this hie and their visibility is controlled 
by the scripts. 

The JavaScript backend consists of two main 
classes. The first one is the AudioPool which 
takes care of audio playback and buffering. It 
pools a set of HTML5 <audio>-tags in a cer¬ 
tain AudioPool <div>-tag. There are simple 
functions to add and load a new hie, connect 
and address it with an ID, manage playback and 
looping as well as synchronized pause and stop 
operations. 

The ListeningTest class provides the main 
functions of an abstract listening test. This in¬ 
cludes the setup and management of basic play¬ 
back controls (play, pause, looping, time line 
display, ...), reading of the test configuration as 
well as storage of the results and also main con¬ 
trol over the test sequence. 

To create a certain test type the abstract 
ListeningTest class is inherited and specihc 
functions for the actual arrangement of test 
items or storage and evaluation of the results 
need to be implemented (cf. section 4.1). Based 
on this modular approach it is very easy to ex¬ 
tend the framework with additional test types 
or to create variants of existing ones without the 
need to reimplement all the necessary basics. 

If the test is performed distributed over the 
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Table 1: Overview of supported codecs and audio formats in current web browsers. 


Browser 

Internet Explorer 

Firefox 

Chrome 

Opera 

Safari 

WAV PCM 

no 

> 3.5 

yes 

> 11.00 

> 3.1 

Ogg Vorbis 

no 

> 3.5 

yes 

> 10.50 

with XiphQT 

MP3 

> 9.0 

> 26, not OS X 

yes 

> 14 

> 3.1 

AAC 

> 9.0 

> 26, not OS X 

yes 

> 14 

> 3.1 


internet or on several local computers, the 
ListeningTest main class is also able to send 
the final ratings to a web service for centralised 
collection and evaluation. 

3.1 Codec support 

Although the HTML5 markup language is al¬ 
ready widely used in the internet there is no 
final adopted standard but only various drafts 
from the world wide web consortium (W3C). 
This may be one of the reasons why its degree 
of implementation can differ a lot between the 
different web browsers. 

Our main interest regarding browser compat¬ 
ibility is the HTML5 <audio> element and its 
support for various file types, whereas in partic¬ 
ular lossless formats like WAV or FLAC would 
be best suited for the desired application. An 
overview of the supported formats is given in 
Table 1 and unfortunately no browser supports 
FLAC 2 or other lossless codecs so far. The only 
lossless, but also uncompressed, format widely 
accepted is WAV PCM with 16 bit sample preci¬ 
sion. Solely the Internet Explorer is not capable 
to play back this file type. 

The described overall situation regarding the 
support of a common codec is quite unsatisfy¬ 
ing. At the time of writing the only recom¬ 
mendation for an audio listening test environ¬ 
ment would be to use the WAV PCM format. 
The circumstance that it is completely uncom¬ 
pressed (data rate approx. 94kB/s per chan¬ 
nel at 44.1kHz sample rate) is relativised by 
the fact that the individual audio test items 
are recommended to be quite short, usually not 
more than 10 seconds, and therefore, the over¬ 
all amount of audio data to be loaded is limited 
to around 1-2 MB per test item. 


2 It should be noted that there is a JavaScript 
based audio decoder framework named Aurora.js that 
is available together with a FLAC decoder at GitHub 
(https://github.com/audiocogs/aurora.js). However, its 
adaptability still has to be investigated. 


3.2 Predefined tests 

As described in the beginning of section 3 the 
main ListeningTest class only provides an ab¬ 
stract implementation with the core functional¬ 
ity of a generic listening test. Two implemen¬ 
tations of specific listening tests are currently 
available in BeaqleJS. The most simple one is 
the so called ABX test and it is best suited 
to understand the functioning and internals of 
the whole framework. The other one is the 
so called MUSHRA ( mu lti stimulus test with 
hidden reference and anchor) which is widely 
used in many evaluation scenarios and there¬ 
fore, one of the most common test types. It is 
defined by the ITU in the BS. 1534-1 recommen¬ 
dation [3]. 

3.2.1 ABX 

In an ABX test (see Fig. 2) three items 
named A, B and X are presented to the listener, 
whereas X is randomly selected to be either the 
same as A or B. The listener has to identify 
which item is hidden behind X, or which one 
(A or B) is closest to X. If the listener is able to 
find the correct item, it reveals that there are 
perceptual differences between A and B. 

A typical application of ABX tests would 
be the evaluation of the transparency of audio 
codecs. For example item A could be an unen¬ 
coded audio snippet and B is the same snippet 
but encoded with a lossy codec. When the lis¬ 
tener is not able to identify if A or B was hidden 
in X (results are randomly distributed), one can 
assume that the audio coding was transparent. 

3.2.2 MUSHRA 

In a MUSHRA test (see Fig. 3) the listener gets 
presented an item marked as reference together 
with several anonymous test items. By using a 
slider for each test item he has to rate how close 
the items are to the reference on top. Among 
the test items there is usually also one hidden 
reference and one, or several, anchor signals to 
prove the validity of the ratings and the quali- 
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ABX Demo Test 


Schubert (1 of 2) 

A x B stop Press buttons to start/stop playback. 

« Please select the item which is closest to X! 


Previous Test Next Test 


» Loop 


Figure 2: Screenshot of the ABX demo test. 


Mushra Demo Test 



Castanets (2 of 2) 

Reference 

Play 

Stop 

| Bad | Poor | Fair | Good | Excellent | 

Test Item 1 

Play 

Stop 

1 1 1 1 B 1 1 

Test Item 2 

Play 

Stop 

1 1 1 1 1 1 

Test Item 3 

Play 

Stop 

i i " r i i i 

Test Item 4 

Play 

Stop 

1 4 1 1 1 1 

Test Item 5 

Play 

Stop 

i i i — i— i i 


Previous Test Next Test 


Figure 3: Screenshot of a MUSHRA test. 

fication of the participants. 

Contrary to ABX tests the MUSHRA proce¬ 
dure allows more detailed evaluations as it is 
possible to compare more than one algorithm 
to a reference. Furthermore, the results are on 
a continuous scale allowing a direct numerical 
comparison of all algorithms under test. 

4 Advanced usage 

If one of the predefined test classes already cov¬ 
ers the desired test requirements, it is only nec¬ 
essary to create a set of test items and to de¬ 
fine the corresponding paths in the config script. 
But it is also quite simple to slightly modify 
existing test layouts and structures or to even 
implement completely new test schemes. 

4.1 Implementation of new tests 

All new test classes have to inherit the base 
functionality from the main ListeningTest 
class. Inheritance in JavaScript is achieved by 
prototypes and this means to define a new class 


// inherit from ListeningTest 
function MyTest(TestData) { 

ListeningTest.apply(this, arguments); 

> 

MyTest.prototype = new ListeningTest ( ); 
MyTest.prototype.constructor = MyTest; 

// implement the necessary functions 
MyTest.prototype.createTestDOM = ... 

MyTest.prototype.saveRatings = ... 

MyTest.prototype.readRatings = ... 

MyTest.prototype.formatRe suits = ... 

Listing 1: Creation of a new test class MyTest 
inheriting from ListeningTest. 


ListeningTest.TestState = 1 
// main public members 
1 CurrentTest’: -1, 

’TestIsRunning ’ : false, 

’ FileMappings ’ : O, 

’Ratings ’ : {} , 

’EvalResults ’ : {} , 

// . . . 

// optionally add own fields 

// . . . 

} 

Listing 2: The TestState structure. 


MyTest and then set its prototype to the base 
class. As this overwrites the constructor it 
has to be reset to the child constructor after¬ 
wards (listing 1). The child class can access 
the TestState (listing 2) and the TestConfig 
(listing 3) from the parent. The first one can 
be used to store random file mappings, ratings 
and other status variables, but can also be dy¬ 
namically expanded with specific fields required 
by the child class. The TestConfig structure is 
just a mapping of the BeaqleJS config file into 
the class namespace. It has to contain at least 
the fields and structure as in listing 3 but it 
is possible to add additional sections which are 
then only read by the child class. 

Every new test class has to implement at least 
four new functions: 

• createTestDOM(Testldx) creates the 
visible layout and HTML structure of the 
test with the index Testldx based on 
the test configuration. All the necessary 
information from the config is available 
inside the object in this .TestConfig. *. 
Audio files should be appended to the 
AudioPool with this. addAudioQ and 
can then be connected to play buttons by 
unique file IDs. Random mapping of file- 
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Li st eningTe s t . Te s t Conf ig = { 

"TestName": "Test 1 , 

"LoopByDefault " : true, 

"EnableABLoop " : true, 

"EnableOnlineSubmission " : false, 
"BeaqleServiceURL" : "http ://. . . " , 

"SupervisorContact " : "super®visor.com" , 

"Testsets": [ 

{ 

"Name": "Test set 1", 

"Files": { 

// . . . 

} 

>, 

{ 

"Name": "Test set 2", 

"Files": { 

// . . . 

} 

} 

] , 

// . . . 

// further test specific settings 

// . . . 

} 

Listing 3: The TestConfig structure. 


names to IDs can be stored in the prepared 

this.TestState.FileMappings[Testldx] 

structure (listing 2). 

• saveRatings (Testldx) is used to obtain 
the ratings from the sliders or but¬ 
tons in the DOM and to store them in 
an arbitrary format in the predefined 

this.TestState.Ratings[Testldx] .* 
object. 

• readRatings (Testldx) is intended 
to read the ratings for Testldx from 
this.TestState.Ratings[Testldx] and 
to reapply them to the sliders or buttons 
in the DOM. This is primarily used during 
switching back and forth in the test 
sequence. 

• f ormatResults (Testldx) is automat¬ 

ically called after the final test in the 
sequence. It is supposed to evaluate and 
summarize the ratings and to store the final 
results in this . TestState. EvalResults. 
It should return a string containing the 
results formatted in a human readable 
manner (HTML). This will be presented 
to the listener after the last test and the 
EvalResults structure may be send to a 
web service (section 4.2). 


4.2 Server side data collection and 
evaluation 

Unfortunately it is not possible to directly 
send emails with JavaScript locally from a web 
browser or to store files. Therefore, to automat¬ 
ically collect the results it is necessary to have 
some kind of web service reachable from your 
network. This can be implemented for example 
with Python, Node.js or simply PHP. 

The ListeningTest class includes the ba¬ 
sic functionality to pack the results object 
this. TestState. EvalResults into a JSON 
(JavaScript Object Notation) structure and to 
transfer it to a web service for collection and 
further evaluation. A simple PHP example is 
included in the beaqleJS_Service .php file. It 
receives a JSON encoded data structure and 
writes its content into a text file with the time 
stamp as filename. 

In the future, a more enhanced server side 
evaluation could include the automatic visuali¬ 
sation and statistical analysis of all the collected 
results. This can be combined with the capa¬ 
bility to export the data in various formats for 
further analysis in scientific tools like SciPy, R 
or Matlab. 

5 Conclusion 

One big difficulty in setting up a proper listen¬ 
ing test for the subjective evaluation of audio is 
its distribution to a significant number of par¬ 
ticipants. This is addressed by BeaqleJS which 
supplies all necessary components to run listen¬ 
ing tests in any modern web browser in a flex¬ 
ible manner. It enables various usage scenarios 
ranging from complete online tests, over semi¬ 
public distribution in the intranet, down to lo¬ 
cal installation on a single computer with di¬ 
rect attendance of a supervisor. The presented 
framework, and its predecessor MushraJS, has 
already been used in various evaluations and 
proved its practical capabilities [6] [7]. 

However, to assure significant and unbiased 
results, it is always advisable to closely stick to 
predefined and established test methods as they 
were introduced in section 2. 

Further development could include a more 
extensive server side data evaluation and vi¬ 
sualisation, but of course also the addition of 
more test schemes. The code is available at 
https://github.com/HSU-ANT/beaqlej s and 
the reader’s contribution and feedback are 
highly appreciated. 
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Abstract 

The GUIDO project gathers a textual format 
for music representation, a rendering engine op¬ 
erating on this format, and a library providing 
a high level support for all the services related 
to the GUIDO format and it’s graphic render¬ 
ing. The project includes now an HTTP server 
that allows users to access the musical-score- 
related functions in the API of the GUIDO- 
Engine library via uniform resource identifiers 
(URIs). This article resumes the core tenants 
of the REST architecture on which the GUIDO 
server is based, going on to explain how the 
server ports a C/C+-1- API to the web. It con¬ 
cludes with several examples as well as a discus¬ 
sion of how the REST architecture is well suited 
to a web-API that serves as a wrapper for an¬ 
other API. 

1 Online musical editing 

As client-server models for the processing, vi¬ 
sualizing and analysis of data become more 
widespread in mobile computing (WordPress, 
YouTube, Instagram, SoundCloud), music en¬ 
graving has entered the fray with various web- 
based score editing services. The GUIDO 
HTTP server merges the idea of a web-based 
music editor with a RESTful web service in or¬ 
der to expose the public API of the GUIDO- 
Engine library[Hoos and Hamel, 1997]. This sec¬ 
tion explores several categories of online musical 
editing services, concluding with a discussion of 
general trends in current technologies and the 
main problems that the tool outlined in this pa¬ 
per - the GUIDO HTTP server - seeks to ad¬ 
dress. 

1.1 Online music notation editors 

As of the writing of this paper (2014), there are 
three main online musical score editors - Note- 


flight 1 , Melodus 2 and Scorio 3 . Noteflight and 
Melodus seek to provide a full-featured music 
editing platform online, similar to Google Doc¬ 
uments’ role in the world of office suites. Scorio 
is a hybrid tool that mixes rudimentary layout 
via a mobile editing platform with publication- 
quality layout via JIT compilation through Lily- 
Pond when possible. 

1.2 Online score sharing software 

Several music tools, such as Sibelius 4 , Mus- 
eScore [Bonte, 2009], Maestro 5 , and Capriccio 6 , 
offer online services where scores composed us¬ 
ing this software can be uploaded, browsed, and 
downloaded online. Capriccio, can be run on¬ 
line in limited form as a Java applet. Mus- 
eScore, Sibelius, and Maestro allow for auto¬ 
matic score/MID I synchronisation of embedded 
hies. 

1.3 Online music JIT compilation 

services 

WebLily 7 , LilyBin 8 , and OMET 9 are all JIT 
compilation services that run the LilyPond ex¬ 
ecutable to compile uploaded code and return 
embedded SVG, canvas or PDF visualizations 
depending on the tool. The GUIDO note server 
[K. and Hoos, 1998] uses the GUIDOEngine li¬ 
brary to compile Guido Music Notation Format 
[Hoos et al., 1998] strings into images. 

1.4 A RESTful alternative 

All of the tools described above facilitate the 
creation or visualization of scores via a variety of 
input methods (WYSIWYG, text, MusicXML 

Yttp://www.noteflight.com 
“http://www.melod.us 
Yttps : //scorio. com 
Yttp: //www. sibelius . com 
’http://www.musicaleditor.com 
Yttp: //cdef gabc . com 
Yttp://weblily.net 
Yttp: //www. lilybin. com 
Yttp: //www. omet. ca 
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etc.) but are not designed to facilitate low- 
latency server-client exchanges of score-related 
information. This is, in part, due to the fact 
that the majority of automated music engraving 
programs do not offer public APIs and are not 
designed to provide end-user information other 
than visual representations of scores and various 
non-human-readable file formats. The GUIDO 
Engine API [Daudin et al., 2009] [Grame, 2014b] 
seeks to remedy this issue by offering a public 
API that reports information about scores such 
as the number of pages, duration, and the place¬ 
ment of musical events both in time and on the 
page. The representational state transfer [Field¬ 
ing, 2000], or REST, architectural style, is well 
suited for the porting of an API to the web be¬ 
cause it is optimized for a system that is state¬ 
less, meaning that it does not require remember¬ 
ing intermediary states of a user. Contrast this 
to, for example, a server that needs to retain an 
undo history or the state of a logged-on user. 
As a result, the design of the server is clearer, 
quick and easy to scale [Richardson and Ruby, 
2008]. This is further discussed in Section 3 and 
Section 4. The GUIDO HTTP server thus fills 
a gap in online score editing technology simi¬ 
lar to the gap filled by Atom web feeds in news 
services. 

2 Representational state transfer 

Representational state transfer [Fielding, 2000] 
is an ubiquitous contemporary server architec¬ 
ture style [Richardson and Ruby, 2008]. The 
REST architecture is intended as a set of con¬ 
straints to facilitate exchange in systems that 
deliver and report on hypermedia resources. 
The architectural style is based on a tradi¬ 
tional client-server model with the design trade¬ 
off that the server is stateless, meaning that 
all of the information required to process a 
request is contained in the request itself and 
the server does not need to store intermedi¬ 
ary states. In order to speed up interaction 
with the server, the REST architecture calls 
for client-side caching of data, which can po¬ 
tentially eliminate certain redundant server re¬ 
quests. It also calls for a uniform, interface, har¬ 
monizing all applications’ interactions with the 
server at the expense of application-specific in¬ 
teraction models that could speed up exchanges. 
Layering is possible in this model, with interme¬ 
diary servers translating various forms of short¬ 
hand into longer or less human-readable server 
commands. With this layering comes the con¬ 


straint that exchanging agents cannot “see” be¬ 
yond the layer with which they are communicat¬ 
ing. As the burden on the client to be server- 
compliant is high in REST, the architectural 
style provides an optional constraint of servers’ 
offering downloadable code-on-demand (scripts, 
applets, etc.) to ease client-side software devel¬ 
opment. 

Certain specific architectural elements are 
put into place in order to facilitate the above- 
described architecture. In addition to the trans¬ 
ferring of data, REST calls for the transferring 
of meta-data about a server response. This al¬ 
lows for the client side to have information about 
how to de-encode the response without needing 
to send specific de-encoding instructions. REST 
also encourages resource requests that are con¬ 
structed in a hierarchical and human-readable 
manner. For example, accessing today’s weather 
in Lyon, France is preferably 

http://website.fr/France/Lyon/weather/today 

rather than 

http://website.fr/?country=France&town=Lyon 
&feature=weather&date=today 

A server compliant with the REST architecture 
is said to be a RESTful server. 

3 The GUIDO HTTP server : an 
overview 

The GUIDO Hypertext Transfer Protocol Dae¬ 
mon (HTTP) server is a RESTful server that 
compiles strings written in the GUIDO Music 
Notation (GMN) Format into musical scores and 
reports to the client several representations of 
this data. 10 It accepts user requests via two 
main methods of the HTTP protocol: POST, 
used to place elements on the server, and GET, 
used to retrieve information about elements on 
the server. 

3.1 The POST method 

POST, as implemented by the GUIDO server, 
is RESTful insofar as it does not save any in¬ 
formation about the user state and only saves 
information sent by the user. 

Assuming that a GUIDO HTTP server is run¬ 
ning on the subdomain http: //guido. grame. 
fr on port 8000, a POST request containing 
GMN code [abed] is sent via curl as fol¬ 
lows: 

curl -d"data=[a bed]" http://guido.grame.fr:8000 

10 In this paper, the terms “GMN” and “score” are used 
interchangeably when talking about music treated by or 
stored on the server. 
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Assuming that the GMN code is valid, re¬ 
sponse, in JSON, gives the user a unique iden¬ 
tifier generated using an SHA-1 tag correspond¬ 
ing to the input hie. This ensures that the 
server will not store the same information mul¬ 
tiple times: 

{ 

"ID": "07a21ccbfe7fe453462fee9a86bc806c8950423f" 

> 

This identifier is generated via the SHA- 
1 cryptographic hashing algorithm [Gallagher, 
2012] that encodes any digital document as a 
160-bit hash or key. The algorithm has a low 
incidence of collision making it almost im¬ 
possible for two documents to share the same 
SHA-1 key. 

This is the server’s internal representation of 
the GMN code and used for all subsequent re¬ 
quests to the server. To access it, it is appended 
onto the URI. The following is a simple request 
using the SHA-1 tag (hereafter shortened to fa¬ 
cilitate readability) that results in the image 
seen in Figure 1. 

curl http://guido.grame.fr:8000/07a21...0423f 



Figure 1: Score with SHA-1 tag 07a21. . . 0423f. 

Technically speaking, the need to use an SHA- 
1 key in order to access scores and score-related 
information is not strictly RESTful. A strictly 
RESTful implementation would embed the score 
in every GET request. In accepting a GMN 
score via POST, the server must “remember” the 
score, which violates the principle of stateless¬ 
ness. The posting of a resource on the server 
is generally considered an acceptable compro¬ 
mise [Richardson and Ruby, 2008] so long as 
it is uniquely identifiable in an URI and the 
resource cannot be modified once uploaded on 
the server. This is the case with scores on the 
GUIDO server. 

3.2 The GET method 

Requests sent via GET query the server for in¬ 
formation about scores. The main return type 
is JSON for all queries related to information 
about a score, MIDI for midi realizations of the 
score, and PNG for all queries asking for visual 
representations of the score itself. The latter 
is also possible in JPEG and SVG. All return 
types are specified in meta-data as per REST 
guidelines (see Section 2). 


3.3 Uniform interface 

The RESTful style specifies that a server’s in¬ 
terface must be uniform, meaning that the op¬ 
erations that it executes must be the same for 
all clients interacting with the server. Further¬ 
more, these operations should be conceptually 
different with no overlap and should ideally be 
widely used. The HTTP standard provides sev¬ 
eral atomic options that allow for the uniform 
interaction with a server [Richardson and Ruby, 
2008]. The GUIDO web API uses the GET and 
POST methods from HTTP via libmicrohttpd 
[Grothoff, 2014], leaving out less widely-used 
methods such as PUT and DELETE in an ef¬ 
fort to expose its full functionality to the largest 
group of client applications possible. 

4 The GUIDO HTTP server as an 
API 

The GUIDO HTTP server attempts to expose 
as much of the public API of the GUIDO En¬ 
gine as possible, implementing one-to-one equiv¬ 
alencies with its functions when possible. Ar¬ 
guments are passed to these functions via op¬ 
tional key-value pairs in the URI’s query part. 
Defaults are provided for all key-value pairs in 
case of omission. An exhaustive overview of the 
API can be found in the GUIDO HTTP server’s 
documentation[Grame, 2014a]. 

This section aims to discuss some of the broad 
decisions made in exposing a C+-1- API via a 
web interface, giving three exhaustive examples 
at the end showing how the API is exposed. 

4.1 SHA-1 key as musical score 

Section 3.1 entertains the manner in which SHA- 
1 keys replace GMN scores in URIs sent to the 
server via in order to avoid having to send GMN 
scores in GET requests. This key corresponds to 
both an ARHandler, or Abstract Representation, 
and GRHandler, or Graphic Representation of a 
score in the GUIDO API. These two structures 
are used in order to generate information about 
the musical contents of a score (ARHandler) as 
well as its layout (GRHandler). The representa¬ 
tion of both structures by one SHA-1 key allows 
the user to have a unique point of entry for each 
GMN score that conflates the data generated by 
several structures. 

4.2 Function as URI segment 

A function in the GUIDO public API is 
represented as a segment of the URI sent 
to the server. For example, the function 
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C/C++ API 

URI segment 

scope 

GuidoGetPageCount 

pagescount 

score 

GuidoGetVoiceCount 

voicescount 

score 

GuidoDuration 

duration 

score 

GuidoFindPageAt 

pageat 

score 

GuidoGetPageDate 

pagedate 

score 

GuidoGetPageMap 

pagemap 

score 

GuidoGetSystemMap 

systemmap 

score 

GuidoGetStaffMap 

staffmap 

score 

GuidoGetVoiceMap 

voicemap 

score 

GuidoGetTimeMap 

timemap 

score 

GuidoAR2MIDIFile 

midi 

score 

GuidoGetVersionStr 

version 

engine 

GuidoGetLineSpace 

linespace 

engine 


Table 1: GUIDO API public functions and 
their representations as URI segments. 


GuidoGetPageCount in the GUIDO public API 
is represented as the URI segment pages count. 

The GUIDO public API provides two generic 
categories of functions: 

• Functions addressed to the engine and re¬ 
porting information about GUIDO. 

• Functions addressed to a specific score pro¬ 
cessed by GUIDO. 

With the C/C++ API, functions addressed to 
a score take score handlers as argument, which 
may be viewed as pointers to the internal score 
object. With HTTP, the SHA-1 tag plays the 
role of these score score handlers and the com¬ 
plete URI defines the scope of the request : 

• Requests addressed to the engine are not 
prefixed. 

• Requests addressed to a specific score are 
prefixed by the SHA-1 key. 

For example, 

http://guido.grame.fr:8000/version 

reports the version of both GUIDO and the 
GUIDO server. On the other hand, the URI 

http://guido.grame.fr:8000/<key>/voicescount 
where <key> is a SHA-1 key 

exposes the API function GuidoCountVoices 
via the URI segment voicescount, giving the 
voice count of specific score. 

Table 1 contains a succinct list of the servers’ 
naming conventions showing the name of a func¬ 
tion in the GUIDO public API, its representa¬ 
tion as a server URI segment, and it’s scope. 
Note that the only generic URI segment that 
does not correspond to a GUIDO public API 


function is server, which gives the version num¬ 
ber of the server and thus is not related to the 
GUIDO API proper. 

4.3 Arguments as key-value pairs 

Several of the API functions listed in Table 1 
require arguments in order to generate results. 
For example, the function GuidoGetStaffMap 
requires an argument staff specifying the staff 
for which the map should be generated. These 
arguments are specified in key-value pairs in the 
URI. 

http://guido.grame.fr:8000/<key>/staffmap?staff=1 

Default arguments are provided for all 
argument-taking functions in case the user 
fails to specify an argument. These arguments 
are values that would work in the majority of 
scores (for example, page=l) and often come 
from defaults provided in the API. 

4.4 Layout and formatting options as 
key-value pairs 

The GUIDO server allows for the specifica¬ 
tion of several parameters relating to the lay¬ 
out and formatting of scores as key-value 
pairs. These parameters are used in sev¬ 
eral different ways in the GUIDO public 
API. Some, such as topmargin, become val¬ 
ues of structures such as GuidoPageFormat. 
Others, such as resize, represent calls to 
functions that effect layout (in this case 
GuidoResizePageToMusic). Yet others, such as 
width, are used at several points in the lay¬ 
out process depending on the chosen backend. 
Rather than devising separate URI construction 
conventions to represent different layout and 
formatting information in GUIDO, all layout 
and formatting options are implemented as key- 
value pairs to make interacting with the server 
uniform in keeping with RESTful style. 

4.5 Return values 

In order to handle the diversity of return types 
provided by the GUIDO API, the server at¬ 
tempts to find MIME types that best approx¬ 
imate the values returned by API functions. 
Sometimes, there is a direct correspondance. 
For example, the formats of images returned by 
the GUIDOEngine library when compiled with 
Qt (JPEG, PNG and SVG) are all MIME types. 

In many cases, the GUIDO API returns cus¬ 
tom structures that have no MIME type equiv¬ 
alent. In these cases, JSON [Crockford, 2013] is 
used to represent hierarchical relationships con¬ 
tained within these structures. 
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For example, the Time2GraphicMap struct { 
is a composite structure consisting of pairs 
of TimeSegment and FloatRect structures. 
TimeSegment corresponds to beginning and end 
of a musical event whereas FloatRect corre¬ 
sponds to its placement on the page. To rep¬ 
resent these structures in server responses, the 
GUIDO server uses JSON where key-value pairs 
correspond to a structure’s element’s name and 
its value. An example of this is given in Sec¬ 
tion 4.6.3, time corresponds to a TimeSegment 
and graph corresponds to a FloatRect. 

4.6 Examples 

4.6.1 voicescount 

The command voicescount returns the number 
of voices in a score. It exposes the GUIDO En¬ 
gine API method GuidoCountVoices. For ex¬ 
ample, the request: 

http://guido.grame.fr:8000/<key>/voicescount 
yields the following result: 

{ 

"<key>": { 

"voicescount": 1 

> 

} 

where "<key>" is the SHA-1 key given by the 
URI. 

4.6.2 pageat 

The command pageat returns the page given 
a specific date, expressed as a rational num¬ 
ber. It exposes the GUIDO Engine API method 
GuidoFindPageAt. For example, the request: 

http://guido.grame.fr:8000/<key>/pageat?date=l/4 
yields the following result: 

{ 

"<key>": { 

"page": 1, 

"date": "1/4" 

} 

> 

4.6.3 staffmap 

The command staffmap returns a map of the 
space each element of a given staff takes up 
in 2D space (represented by a box) and time 
space (represented as an interval of rational 
numbers). It exposes the GUIDO Engine API 
method GuidoGetStaffMap. For example, the 
request: 

http://guido.grame.fr:8000/<key>/staffmap?staff=1 

yields the following result, abbreviated below to 
minimize its space on the page: 


"<key>" : -[ 

"staffmap": [ 

{ 

"graph": { 

"left": 916.18, 
"top": 497.803, 
"right": 1323.23, 
"bottom": 838.64 

>, 

"time": { 

"start": "0/1", 
"end": "1/4" 

> 

}, 


{ 

"graph": { 

"left": 2137.33, 

"top": 497.803, 

"right": 2595.51, 

"bottom": 838.64 

}, 

"time": { 

"start": "3/4", 

"end": "1/1" 

> 

> 

1 

> 

> 

5 Conclusion 

The GUIDO HTTP server uses RESTful archi¬ 
tectural principles such as statelessness, a uni¬ 
form interface and a separation of client-server 
functionality in order to provide low-latency in¬ 
formation retrieval. Information corresponds 
to uploaded GMN scores, encoded as various 
MIME types and transmitted via the HTTP 
protocol. The server exposes the robust GUIDO 
Engine public API via an interface based on 
standardized URI construction. It is intended 
for use by various applications needing to vi¬ 
sualize musical scores and process score-related 
data. It is especially well-suited as an alter¬ 
native to embarking libraries or external appli¬ 
cations in score processing software. As cloud 
computing and mobile human-computer inter¬ 
action becomes more common, this form of data 
transmission and processing is increasingly nec¬ 
essary. The GUIDO HTTP server intends to fill 
this by following RESTful architectural recom¬ 
mendations that have proven successful in other 
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server-based services. 

The GUIDO project is an open source project 
hosted by sourceforge 11 . The GUIDO HTTP 
server is running at 

http://guidoservice.grame.fr/. 
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Abstract 

The Web Audio API is a platform for doing audio 
synthesis in the browser. Currently it has a number 
of natively compiled audio nodes capable of doing 
advanced synthesis. One of the available nodes the 
” ScriptProcessorNode” allows individuals to create 
their own custom unit generators in pure JavaScript. 
The Faust project, developed at Grame CNCM, con¬ 
sists of both a language and a compiler and allows 
individuals to deploy a signal processor to various 
languages and platforms. This paper examines a 
technology stack that allows for Faust to be com¬ 
piled to highly optimized JavaScript unit generators 
that synthesize sound using the Web Audio API. 

Keywords 

WebAudio, Faust, Emscripten, JavaScript, asm.js 

1 Introduction 

The Web Audio API, released in 2011, is “a 
high-level JavaScript API for processing and 
synthesizing audio in web applications.” 1 Cur¬ 
rently there are a number of natively compiled 
audio nodes within the API capable of do¬ 
ing various forms of synthesis and digital sig¬ 
nal processing. One of the available nodes, 
the “ScriptProcessorNode”, allows individuals 
to create their own custom unit generators in 
pure JavaScript, extending the Web Audio API. 

While the concept of making interactive 
sound synthesis environments in the browser 
is quite exciting, many factors stop individu¬ 
als from investing time into the Web Audio 
platform. Ignoring the constraints of a single 
threaded environment there appear to be two 
primary limitations when working with web au¬ 
dio: There has not yet been enough Signal Pro¬ 
cessing related JavaScript code written yet, and 
some signal processing concepts prove difficult 
to implement efficiently in a loosely typed lan¬ 
guage with no memory management. 

1 https://dvcs.w3.org/hg/audio/raw-file/tip/ 
WebAudio/specification.html 


2 Some Context 

2.1 WAAX and Flocking 

There are a number of projects that are in de¬ 
velopment abstracting over top of the Web Au¬ 
dio API in order to extend its capabilities, cre¬ 
ate more complicated unit generators, and al¬ 
low for a more intuitive syntax. Projects such 
as WAAX (Web Audio API extension) 2 by 
Hongchan Choi do so while using only the na¬ 
tively compiled nodes in order to ensure opti¬ 
mum efficiency. [H. Choi and J.Berger, 2013] 

While these projects offer a wide variety of 
unit generators and synthesis modules, they 
cannot be used to implement all cutting edge 
techniques. For example the delay node inter¬ 
face does not offer a tap in or tap out function, 
making wave guide models impossible to imple¬ 
ment. 3 

The Flocking audio synthesis toolkit 1 by 
Colin Clark offers a unique declarative model 
for doing signal processing within the browser. 
Unlike WAAX, Flocking has opted to internally 
manage all signal generation and using a sin¬ 
gle “ScriptProcessorNode” to hand off precom¬ 
puted buffers of samples. 

WAAX and Flocking offer two very different 
approaches to Web Audio. WAAX offers effi¬ 
ciency, whereas Flocking offers an extensible ar¬ 
chitecture and declarative syntax in which web 
developers can write their own first-class custom 
unit generators. That being said both projects 
suffer from the same problem, a lack of man 
hours. There are only so many individuals who 
have the time and domain specific knowledge 
necessary to contribute to their development. 

2.2 Introduction to Faust 

The Faust project offers a unique solution to 
this problem; rather than write code, generate 

2 https://github.com/hoch/waax 

Attps://dvcs.w3.org/hg/audio/raw-file/tip/ 
WebAudio/specification,html#DelayNode-section 

4 http://flockingjs.org/ 
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it. Faust, developed at Grame CNCM,“is a pro¬ 
gramming language that provides a purely func¬ 
tional approach to signal processing while offer¬ 
ing a high level of performance.” [Orlarey et al., 
2009] The project is both a language and a com¬ 
piler, offering the ability to write code once and 
deploy to many different signal processing envi¬ 
ronments. 

Faust also has a community of scientists and 
developers who have contributed a large amount 
of code waiting to be compiled to other plat¬ 
forms. For example Julius Smith has done a 
substantial amount of research using Faust to 
implement wave guide synthesis models [Smith 
et ah, 2010] and Romain Michon has ported the 
entire STK to Faust. [Michon and Smith, 2011] 

Creating an efficient compile path from Faust 
to the Web Audio API would allow for all of the 
available Faust code to immediately be able to 
run in the browser. Further, using the archi¬ 
tecture compilation model that Faust is famous 
for we would be able to wrap the compiled Web 
Audio code to be compatible with all current 
libraries and frameworks such as WAAX and 
Flocking. 

2.3 Current Web Audio 
Implementation 

Currently there is an implementation done by 
Stephane Letz to compile Faust to Web Audio 
directly from the Faust Intermediate Represen¬ 
tation 5 . While the implementation is elegant, 
any algorithms relying on integer arithmetic are 
currently broken due to JavaScript representing 
all Numbers as 32-bit floating point at a binary 
level. 

2.4 Introduction to asm.js 

One way to do integer arithmetic with cross¬ 
browser support is asm.js. The asm.js specifica¬ 
tion 6 outlines a ‘strict subset’ of JavaScript that 
offers a unique programming model. Through 
the use of typed arrays 7 it is possible to do inte¬ 
ger and floating-point arithmetic. This is done 
with a virtual machine that gives developers ac¬ 
cess to a heap and functions to be used to man¬ 
age memory and perform arithmetic operations. 

While it would have been possible to use 
Stephane Letz’s work as a starting point and 

’http://faust.grame.fr/index,php/7-news/ 

73-faust-web-art 

’’http: //asmj s . org/spec/latest/ 

'https://developer.mozilla.org/en-US/docs/ 
Web/JavaScript/Typed_arrays?redirectlocale= 
en-US&redirectslug=JavaScript/Typed_arrays 


extend the current WebAudio architecture to 
utilize the asm.js susbset, it would require quite 
a bit of overhead. Not only would integer and 
floating point specific interpretation need to be 
implemented, but a functional virtual machine 
would need to have been developed in order to 
take advantage of asm.js. 

Further, we would not see any of the opti¬ 
mization benefits that one would get from a 
modern compiler such as gcc or clang. In the 
spirit of this project, a search was done to find a 
way to automate away the need to worry about 
all of these complications. 

2.5 Introduction to Emscripten 

Emscripten is a project started by Alon Za- 
kai from Mozilla that compiles LLVM(Low 
Level Virtual Machine) assembler to JavaScript, 
specifically asm.js. [Zakai, 2011] The platform is 
both a compiler and a virtual machine capable 
of running C and C++ code in the browser. 

Emscripten gives you an interface to break 
out C functions so that they can be called us¬ 
ing JavaScript. It also provides functions for 
managing memory in the virtual machine your 
C code is running. These functions allow you to 
allocate new memory to be operated on (in the 
case of sound buffers), and the ability to manip¬ 
ulate memory in the heap (in order to change 
parameters). 

Currently Faust is able to compile to a C+-1- 
file using the minimal.cpp architecture file, the 
resulting file can painlessly be compiled to 
asm.js with Emscripten. The upstream Faust2 
branch can compile Faust to LLVM byte-code 
which offers another potential compilation path. 

3 Making Noise 

A first approach to automating the compila¬ 
tion process from Faust to Web Audio involves 
manually implementing each step. The Faust 
code needs to be compiled to C+-1- and have 
the resulting dsp class wrapped in order to al¬ 
low internal data and member functions to be 
accessed once compiled to JavaScript. The re¬ 
sulting C-|—b file then needs to be compiled 
by Emscripten to asm.js. The asm.js needs to 
once again be wrapped in order to provide an 
intuitive JavaScript interface that will operate 
on the dsp object running in the Emscripten 
virtual machine. Finally an interface between 
the Emscripten virtual machine and WebAudio 
needs to be made to hand off samples that need 
to be sonified. 
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As an initial proof of concept it was at¬ 
tempted to compile the example noise.dsp that 
comes shipped with Faust to JavaScript by way 
of Emscripten. Noise was a prime candidate for 
these initial tests due to the integer specific cal¬ 
culations used in its algorithm. 

The below sections will describe the pro¬ 
cess used to manually implement noise in the 
browser starting from a faust dsp file, and end¬ 
ing with a working Web Audio API JavaScript 
Object. 


Faust 


Original ugen 
implemented in 
Faust 


i 


Faust Compiler converts faust to C++ 


C++ 


Wrap C++ with meta 
functions to call 
in to virtual machine 


♦ 


Emscripten compiles C++ to LLVM bitcode 


LLVM 


It is possible to run 
various optimizations 
when compiling to 
LLVM bitcode 


I 


Emscripten compiles LLVM bitcode to asm.js 


asm.js 


The closure compiler 
could be used at his 
step to further optimize 
compiled code 


i 


Emscripten breaks out function that are 
globally accessible 


JavaScript 


Compiled JavaScript 
is wrapped to emulate 
interface of generic 
web audio ugens 


3.1 Faust Source 

The noise unit generator starts as a Faust dsp 
hie. 


This tells faust to compile the above code us¬ 
ing the minimal, cpp architecture hie, to call the 
object being created Noise, and to include all 
necessary header hies and dependencies. 

The resulting C++ code need to be wrapped 
with a series of meta functions that can be called 
to operate on objects living in the virtual ma¬ 
chine. A constructor and destructor are imple¬ 
mented in order to create objects and properly 
clean them up, and a compute function is then 
used to grab the latest frame of samples from 
the unit generator. In order to change the state 
of the unit generator after its instantiated in the 
heap a number of other functions are available 
to create a map of the ugen’s parameters, and 
get / set values. 

3.2 Emscripten & asm.js 

Once the wrapper has been concatenated with 
the Faust compiled C++ it can then be com¬ 
piled by Emscripten to asm.js. This is done 
with the following command 

emcc cpp/faust-noise.cpp -o \ 
js/faust-noise-temp.js \ 

-s EXPORTED_FUNCTTONS="\ 
[ , _NOISE_constructor ) ,\ 

’ _NOISE_destructor’,\ 

’_NOISE_compute 1 ,\ 

’ _NOISE_getNumInputs’,\ 

’_NOISE_getNumOutputs’,\ 

1 _NOISE_getNumParams’,\ 

’ _NOISE_getNextParanP]" 

Note the exported functions, which are refer¬ 
encing the seven wrapper functions mentioned 
in the previous step. This is required to stop 
Emscripten from obfuscating the names of the 
functions when certain optimization flags are 
thrown during compilation, and to make access 
to them available in the global namespace of 
JavaScript. 


random = +(12345)~*(1103515245); 
noise = random/2147483647.0; 
process = noise * 0.5; 

In order to compile to C++ in a manner that 
will be compatible with Emscripten we must use 
the follow command. 

faust -a minimal.cpp -i -uim \ 

-cn Noise dsp/noise.dsp \ 

-o cpp/faust-noise.cpp 


3.3 Web Audio Api 

Once the asm.js code has been compiled a 
JavaScript wrapper is used to break out the 
functionality of the code into JavaScript func¬ 
tions. As well, the correct context for gener¬ 
ating audio in the browser needs to be set up 
within the Web Audio API, connecting the gen¬ 
erated data from the Faust generated functions 
to the correct Web Audio API functions in or¬ 
der to generate sound. Again this wrapper can 
be found in the source repository on GitHub 
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4 Results 

Using the above methods a Faust compiled 
WebAudio noise unit generator was successfully 
created. The result can be found at: 

http://thealphanerd.io/examples/ 
faust2webaudio/ 

4.1 Other Examples 

This process has been repeated for a number 
of other unit generators including a sine oscilla¬ 
tor, freeverb, and a 16th order FDN reverb (in¬ 
cluded in the Faust distribution as Reverb De¬ 
signer). All three examples work in the browser, 
although the 16th order FDN takes a few sec¬ 
onds to get going. Once the unit generators 
have been compiled to JavaScript it is quite easy 
to connect them to each, and other web audio 
components. 

Below is an example of how to create a 
noise object, and apply freeverb to its out¬ 
put. Both of these objects have been com¬ 
piled using Faust2WebAudio. This example can 
be found online at http://thealphanerd.io/ 
examples/faust2webaudio/freeverb.html 

var noise = faust.noise(); 
var freeverb = faust.freeverb(); 
noise.connect(freeverb); 
noise.update("Volume" , 0.1); 
freeverb.update("Damp" , 0.75); 
freeverb.update("RoomSiz" , 0.75) ; 
freeverb.update("Wet" , 0.75); 
freeverb.play(); 

5 Limitations 

Currently the automation layer has not yet been 
completed. While the wrapper scripts have 
all been generically written, hand written bash 
scripts utilizing tools such as sed are currently 
being used to compile individual unit genera¬ 
tors. A next step would involve moving the 
generic wrappers in to their own architecture 
file and relying on the Faust build system to 
handle generic compilation. 

Another major limitation is that I am cur¬ 
rently utilizing a separate instance of the Em- 
scripten virtual machine for each unique unit 
generator. This is an unfortunate side effect 
of the current compilation method. Emscripten 
includes the virtual machine at the head of every 
compiled js file. There is an option to statically 
link a number of compiled js hies to a single 


optimized Hie with redundancies removed, but 
I am concerned about the implications of that 
workflow. 

A developer would be required to supply all 
of the faust objects at once, and not have the 
ability to swap in and out Hies at their leisure. 
Unless their is an intuitive and fast way to com¬ 
pile this final file, it will make it difficult for in¬ 
dividuals to add new unit generators on the fly 
as they are composing in the browser. 

One solution is to utilize a JavaScript task 
runner such as grunt to watch for changes in 
specific directories / files and to properly com¬ 
pile and statically link multiple files on the fly. 

6 Looking Forward 

While the above mentioned limitations do need 
to be worked on, benchmarks should be per¬ 
formed on the currently compiled code to en¬ 
sure that this compilation method is in fact a 
good direction. 

As well, Stephane Letz and Yann Orley have 
expressed a desire to approach this problem 
using their original method of going directly 
from the Faust Intermediate Representation to 
JavaScript. This would avoid moving from a 
functional language to an object oriented lan¬ 
guage back to a function language, which has 
proven somewhat inelegant. It may prove ap¬ 
propriate once the Emscripten method can be 
benchmarked to put time in to developing this 
more direct compilation path so that the results 
from the two methods can be compared. 

7 Conclusion 

The results of this research have shown that it is 
indeed possible to get compiled Faust code run¬ 
ning properly in the browser. This is very ex¬ 
citing, as if the benchmarks are encouraging we 
will be able to use the resulting code to greatly 
expand the ecosystem for digital signal process¬ 
ing in the browser. 

One of the most exciting parts of the results 
are that if this process can be perfected we will 
continue to see improvements in efficiency as 
the various technologies we are relying on con¬ 
tinue to improve. As JavaScript becomes more 
efficient, so does the compiled code. As We¬ 
bAudio becomes more stable, so does the com¬ 
piled code. As asm.js optimizations improve in 
the browser, we get the optimizations for free. 
Simply put, even if the resulting benchmarks 
prove to not be competitive with current hand 
written JavaScript, it will only get better with 
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time while requiring minimal time maintaing 
the project. 

8 Code Repository 

Find the source online at: 

https://github.com/TheAlphaNerd/ 

faust2webaudio 
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Abstract 

We present extensions to the Ambisonic Decoder 
Toolbox to efficiently design periphonic decoders for 
non-uniform speaker arrays such as hemispherical 
domes and multilevel rings. These techniques include 
modified inversion, A11RAD, and spherical Slepian 
function-based decoders. We also describe a new 
backend for the toolbox that writes out full-featured 
decoders in the Faust DSP specification language, 
which can then be compiled into a variety of plug¬ 
in formats. Informal listening tests and performance 
measurements indicate that these decoders work well 
for speaker arrays that are difficult to handle with 
conventional design techniques. The computation is 
relatively quick and more reliable compared to non¬ 
linear optimization techniques used previously. 

Keywords 

Ambisonic decoder, HOA, hemisphere, Faust 

1 Introduction 

This is a paper about extensions to the Ambi¬ 
sonic Decoder Toolbox to efficiently design deco¬ 
ders for loudspeaker arrays with partial coverage 
of the sphere, such as domes and multilevel rings. 
The criteria for Ambisonic reproduction are: 

• Constant amplitude and energy gain for all 
source directions 

• At low frequencies, reproduced wavefront 
direction and velocity are correct 

• At high frequencies, maximum concentra¬ 
tion of energy in the source direction 

• Matching high- and low-frequency perceived 
directions 

In the case of decoders for partial-coverage 
arrays, we relax these to apply only to source 
directions that are within the covered part of the 
sphere, but still require that the decoder be “well 
behaved” for sources from other directions. 

Conventional techniques for periphonic deco¬ 
der design work well when the speakers are dis¬ 
tributed uniformly around the listening position. 


First-order Ambisonics can be accommodated in 
many listening rooms; however, when moving 
to higher-order reproduction the need arises to 
place more loudspeakers below the listener. This 
requires placing the listening position high in 
the room or on an acoustically transparent floor 
with a space below to install speakers. Neither 
of these are practical for most installations, so 
hemispherical dome configurations are a popular 
alternative. In addition, it may be impractical 
to install speakers directly overhead, resulting 
in a configuration of horizontal rings of speakers 
at multiple heights. These configurations leave 
gaps in coverage below, and possibly above, the 
listening position. 

In a previous paper, we describe a Mat- 
LAB/GNU Octave 1 toolbox for generating Ambi¬ 
sonic decoders that uses inversion or projection 
to generate an initial estimate and then non¬ 
linear optimization to simultaneously maximize 
ve and minimize directional and loudness errors 
[2012]. While this works well for small arrays, we 
found that increasing the Ambisonic order and 
number of loudspeakers causes the optimizer to 
converge slowly and get stuck in local minima 
unless the starting solution is close to optimal. 2 

In the case of hemispherical domes and mul¬ 
tilevel rings, neither inversion or projection pro¬ 
vide a close starting point. Once the speaker 
array deviates from uniform geometry, an in¬ 
version decoder will trade uniform loudness for 
directional accuracy by putting more energy in 
directions where gaps between the loudspeakers 
are larger. A projection decoder does just the 
opposite, putting equal energy into all the speak- 

1 In this paper, we use “Matlab” to refer to both 
Matlab and GNU Octave. Care has been taken to 
make sure the code runs in both; however, not all of the 
graphics work well in Octave. Matlab is a registered 
trademark of The MathWorks, Inc. 

2 A recent paper by Arteaga [2013] takes advantage 
of symmetries in the loudspeaker array and a reformula¬ 
tion of the objective function to improve the convergence 
behavior of the optimization process. 
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ers regardless of spacing, hence they are louder 
in directions where there are more speakers. In 
practice, neither provides an adequate starting 
point for the optimization process. 

The general problem is that it is difficult to 
pull the sound image beyond the space where 
there is dense coverage. For the case of hemi¬ 
spheres this not only means that performance 
will suffer below the horizon, but that it will be 
poor at the horizon. Because horizontal perfor¬ 
mance is uniquely important, it is necessary to 
make the decoder perform well there, despite the 
difficulties. 

New design techniques have been proposed 
over the last few years to handle these sorts of ar¬ 
rays. We have implemented these in the toolbox 
to make them available to a wider user group. 
The toolbox has been extended beyond third- 
order decoding, and to support component order 
and normalization conventions other than Furse- 
Malham. We also wanted to support a variety of 
plug-in architectures. A new decoder engine was 
written in the Faust (Functional Audio Stream) 
DSP Specification language [Orlarey, Fober, and 
Letz 2009; Smith 2013a], which includes facilities 
for dual-band decoding, and near-field, distance, 
and loudness compensation. 

1.1 Auditory Localization 

In this paper we utilize Gerzon’s two main local¬ 
ization models to predict decoder performance: 
the velocity localization vector, r-y, and the en¬ 
ergy localization vector, rg- These are defined 
and discussed in our previous paper on the tool¬ 
box [Heller, Benjamin, and Lee 2012] (and many 
other places). Briefly, these models encapsulate 
the primary interaural time difference (ITD) and 
interaural level difference (ILD) theories of audi¬ 
tory localization. The direction of each indicates 
the direction of the localization perception, and 
the magnitude indicates the quality of the local¬ 
ization. In natural hearing from a single source, 
the magnitude of each is exactly 1 and the direc¬ 
tion is the direction to the source. 

1.2 Math Notation 

We use lowercase bold roman type to denote vec¬ 
tors (v), uppercase bold roman type to denote 
matrices (M), italic type to denote scalars (s), 
and sans serif type to denote signals (W). A 
scalar with the same name as a vector denotes 
the magnitude of the vector. A vector with a 
circumflex (“hat”) is a unit vector, so, for exam¬ 
ple, te = te/te. “At” is the Moore-Penrose 
pseudoinverse of A (pinv(A) in Matlab) and 


“A t ” is the transpose of A (A. ’ in Matlab). 

2 Decoder Design Techniques for 
Domes and Multilevel Rings 

In Ambisonics, the standard technique for deri¬ 
ving the basic decoder matrix, M, is to invert 
the matrix, K, whose columns are composed of 
the spherical harmonics sampled at the speaker 
positions, such that M K = I, where I is the 
identity matrix [Gerzon 1980; Heller, Lee, and 
Benjamin 2008]. 3 

Because K is “encoding” the speaker positions, 
some authors call it the reencoding matrix and 
refer to the inversion as mode matching. In the 
general case, K is rank deficient, so the inver¬ 
sion must be done by least-squares or by us¬ 
ing singular-value decomposition (SVD) and the 
Moore-Penrose pseudoinverse. 

Problems arise when a given loudspeaker array 
does a poor job of sampling some of the spheri¬ 
cal harmonics, such as sampling at or near zero 
crossings or having more than one zero crossing 
between samples. In these cases, K will be ill- 
conditioned (difficult to invert without loss of 
precision) and the resulting decoder will have 
greater energy gain in certain directions, result¬ 
ing in reduced te and greater loudness in those 
directions. 

In the following subsections, we discuss three 
strategies implemented in the toolbox: 

• Use an inversion technique suited to ill- 
conditioned problems 

• Invert a well-behaved full-sphere coverage 
array, map to the real array 

• Derive a new set of basis functions for which 
the inversion is well behaved 

2.1 Modified Inversion 

One proposed solution is to set all of the singular 
values to 1 when computing the pseudoinverse 
[Pomberger and Zotter 2012], This has the ef¬ 
fect of diminishing the use of the poorly sam¬ 
pled spherical harmonics. The resulting decoder 
has constant energy (hence, loudness) in all di¬ 
rections, at the expense of increased directional 
errors. 

Another solution is to use a truncated SVD 
when computing the pseudoinverse. This simply 
discards the poorly sampled spherical harmon¬ 
ics. In the conventional pseudoinverse (e.g., as 

3 The term sampling is used here to mean evaluating 
the given spherical harmonic function at a particular 
azimuth and elevation. 
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implemented in Matlab), normalized singular 
values 4 less than 10 -15 are not inverted. In a 
truncated SVD, a much larger threshold is used. 
For example, setting the threshold to ^ puts an 
upper limit of 3 dB on the loudness variations, 
again, at the expense of increased directional 
errors. 

The toolbox also can produce decoders that 
are a linear combinations of conventional pseu¬ 
doinverse and these alternatives, providing a sin¬ 
gle parameter to tradeoff uniform loudness and 
directional accuracy. Other approaches to in¬ 
verting ill-conditioned matrices have been ap¬ 
plied to this problem, such as Tikhonov regu¬ 
larization [Poletti 2005] and LASSO (least ab¬ 
solute shrinkage and selection operator) [Chen 
and Huang 2013]. Currently, we have not imple¬ 
mented these, although the linear combination 
approach described above provides a result sim¬ 
ilar to Tikhonov regularization. 

2.2 Hybrid Ambisonic-VBAP Decoding 

The hybrid Ambisonic-VBAP approach is called 
“All Round Ambisonic Decoding” (A11RAD) by 
Zotter and Frank [2012], Briefly, one computes 
a decoder for a uniform array of virtual speakers 
and then maps the signals for the virtual array 
to the real loudspeaker array using Vector Base 
Amplitude Panning (VBAP) [Pulkki 1997]. 

VBAP always produces the smallest possible 
angular spread of energy for a given panning di¬ 
rection and speaker array, hence the perceived 
size of a virtual source changes depending on di¬ 
rection. This is directly at odds with the Ambi¬ 
sonic approach, which tries to keep the perceived 
size of a virtual source constant regardless of 
source direction. A11RAD uses two strategies to 
mitigate this: 

1. The number of virtual speakers is made 
much larger than the number of real speak¬ 
ers. 

2. Imaginary speakers are inserted to fill in 
large gaps in the real loudspeaker array in 
order to keep the triangular faces of the tes¬ 
sellation as regular as possible. 

A11RAD places the virtual speakers according 
to a spherical t-design [Hardin and Sloane 2002], 
A spherical t-design of degree t is a finite set of 
points on a sphere, such that the integral of any 
polynomial of degree t or less over the sphere 
is equal to the average value of the polynomial 

4 the set of singular values divided by the largest one 



Figure 1: Plot of real speaker locations for the up¬ 
per hemisphere in CCRMA’s Listening Room (black 
hexagrams), unit sphere tessellation, and intersection 
points of 240 virtual speaker directions (green plus 
sign). The speaker at the bottom is an imaginary 
speaker added to keep the facets of the tessellation as 
regular as possible. The location of the intersection 
points are used to calculate the VBAP gains to the 
real speakers. 

sampled at the points in the set. The present 
implementation uses the 240-point spherical t- 
design for the virtual array, which is the largest 
currently-known t-design. 

There are three steps to the design of an All- 
RAD decoder: 

1. Select a spherical t-design for the array of 
virtual speakers and compute a decoder for 
it. Because the virtual speakers are dis¬ 
tributed uniformly on the sphere the inver¬ 
sion is well behaved. 

(a) Compose the matrix Ky whose 
columns are the spherical harmonics 
sampled at the directions of the virtual 
speakers. 

(b) Compute the decoder matrix for the 
virtual array, My = Kyf 

2. Compute the matrix of VBAP gains for each 
virtual speaker. 

(a) Project the positions of the real speak¬ 
ers onto the unit sphere. 

(b) Add imaginary speakers to the array to 
fill in any gaps larger than 90°. For a 
dome this will be one at the bottom. 
For a multilevel ring, one at the top 
and one at the bottom. The distance 
from the center determines how quickly 
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(a) rg vs. Test Direction (b) Te Direction Error (degrees) (c) Energy Gain (dB) 

Figure 2: The A11RAD decoder’s performance for the upper hemisphere of CCRMA’s Listening Room. 
These show the (a) energy concentration, (b) directional accuracy, and (c) loudness of sources from various 
directions. Directional errors are clipped at 10° so that smaller errors can be seen. The plots have been 
quantized to make the structure clearer. Note that the Mercator projection used overemphasizes the poles. 


sources fade as they move outside the 
region of the sphere covered by the real 
speaker array. 

(c) Compute the triangular tessellation of 
the convex hull of the projected speaker 
positions. 

(d) Determine the intersection point of the 
vector to each virtual speaker with the 
faces of the convex hull. 

(e) Calculate the barycentric coordinates 
of each intersection point. These are 
the VBAP gains from that virtual 
speaker to the three real speakers at 
the vertices of the face. 

(f) Assemble the matrix of the VBAP 
gains, Gy->R. This matrix has one col¬ 
umn for each virtual speaker and one 
row for each real speaker. Each col¬ 
umn will have up to three gains for that 
virtual speaker from the previous step. 
Gains to imaginary speakers are omit¬ 
ted. 

3. The basic decoder matrix is 

M = Gy ,r My. 

Figure 1 shows the real and imaginary speaker 
positions, the tessellation of the speaker direc¬ 
tions, and the intersection points of the vectors 
to each virtual speaker with the faces of the tes¬ 
sellation. The example shown is for the upper 
hemisphere of loudspeakers in CCRMA’s Listen¬ 
ing Room. Figure 2 shows the performance of 
the A11RAD decoder used in the listening tests. 


2.3 Spherical Slepian Function 
Decoding 

Spherical Slepian functions (SSF) are linear com¬ 
binations of spherical harmonics that produce 
new basis functions that are approximately zero 
outside the chosen region of the sphere, but 
also remain orthogonal within the region of in¬ 
terest. This makes them suitable for decom¬ 
posing spherical-harmonic models into portions 
that have significant energy only in selected ar¬ 
eas [Beggan et al. 2013; Simons, Dahlen, and 
Wieczorek 2006]. They have been used in satel¬ 
lite geodesy to model the magnetic and gravi¬ 
tational fields of the earth from satellite data 
that does not cover the whole earth. In design¬ 
ing Ambisonic decoders, they allow us to specify 
a region of interest on the sphere and derive a 
new set of basis functions that is well conditioned 
within that region. Zotter et al. call this “Energy- 
Preserving Ambisonic Decoding” (EPAD) [2012], 
The procedure implemented in the toolbox is 
described here. 

1. Define the subset of the surface of the sphere 
for the decoder, 1Z C S 2 , where S 2 denotes 
the surface of the unit sphere in M 3 . To 
assure good performance at the boundary, 
select it to be a bit larger than the area 
covered by the loudspeakers; for the decoder 
tested, we used —30° to 90° elevation. 

2. Compose the Gramian matrix, G, of the in¬ 
ner products of the real spherical harmonics, 
Yi m (6), over the region 7Z. Each element, 
Sim,l'm' i of G is given by 

Sim, I'm' = (Ylm,Yl'm')Tl 

= f Y lm (0) Y Vm ,{0) dG 

JTZ 
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where Im is a single-index designator for the 
real spherical harmonic of degree l and order 
to. 6 = [cos0cos</> sin 6 cos </> sinc/>] T , 
and 9 and <f> are azimuth and elevation. 

3. Compute the eigen decomposition of G —> 
U A U -1 . U is a unitary matrix whose 
columns are the eigenvectors of G. The di¬ 
agonal elements of A are the corresponding 
eigenvalues. 

4. Compose a new matrix, Ussf> by selecting 
the columns of U with eigenvalues above 
some threshold, a. a should be approxi¬ 
mately the fraction of the sphere covered by 
the region of interest. For a hemispherical 
dome, we use a = This matrix trans¬ 
forms points in the spherical harmonic basis 
to points in the new SSF basis. 

5. Compose the speaker reencoding matrix, K, 
where the columns are the spherical harmon¬ 
ics sampled at each speaker direction. Trans¬ 
form it to the new basis, Kggp = UgsF T K 

6. Compute the basic decoder matrix, M = 

KsgF 1 ” UgSF T . 

Figure 3 shows balloon plots of the all 16 spher¬ 
ical Slepian basis functions for the region —30° to 
90° elevation on the sphere. Note that the first 
eight are concentrated in the upper hemisphere, 
the next two in the middle, and the last six in 
the lower hemisphere. The first 13 (those with 
A > ^) were used for the third-order decoder 
we tested. One observation is that this method 
creates basis functions that have a clearer re¬ 
lationship with source directions, which is not 
possible for the spherical harmonics above first 
order. Figure 4 shows the performance of the 
SSF decoder used in the listening tests. 

2.4 Max-rg; Decoders 

The basic decoder matricies, M, calculated in 
the preceding sections, are transformed into 
ma x-ve decoders by multiplying by a matrix, I\ 
whose diagonal entries are the per-order gains 
that maximize te over the sphere. M max _ rB = 
M T. The calculation of these gains is discussed 
in the appendix of [Heller, Benjamin, and Lee 
20121 . 

3 In-situ Performance 
Measurements 

The Ambisonic decoder design philosophies dis¬ 
cussed above are generally intended to optimize 
the psychoacoustically based parameters of the 


Gerzon Energy Vector theory. It is expected that 
those parameters generally predict the subjec¬ 
tive performance of the system but, they are not 
the same as the parameters that directly predict 
what is heard by the listeners. We use measure¬ 
ments of the ITD and ILD to gauge the localiza¬ 
tion performance in actual systems. ITDs are 
known to predict localization of low-frequency 
sounds and ILDs are known to predict the local¬ 
ization of high-frequency sounds. 

A group of measurements were performed in 
CCRMA’s Listening Room at Stanford Univer¬ 
sity. 5 That room is equipped with 22 loudspeak¬ 
ers arranged as a horizontal ring of eight loud¬ 
speakers, rings of six loudspeakers at +40° and 
—50° elevation, and one loudspeaker each at the 
zenith and nadir. This allowed the option of ei¬ 
ther using the full spherical array or decoders 
designed specifically to drive the upper 15 loud¬ 
speakers as a hemisphere. One decoder was de¬ 
rived by using the A11RAD method and the other 
by using a SSF basis set. 

The ITDs and ILDs created by real systems 
were measured by using a dummy head to record 
test signals reproduced from a variety of di¬ 
rections. The test signals are ambisonically 
panned exponential sine sweeps from which the 
impulse response is computed from each direc¬ 
tion. Those impulse responses are binaural im¬ 
pulse responses, from which the ITDs and ILDs 
can be derived. 

The ITDs were calculated by band-pass fil¬ 
tering the impulse responses to the bandwidth 
of interest and comparing the time of arrival at 
the two ears of the dummy head. Performing 
the calculation at 192 kHz sample rate gives a 
time resolution of 5 ps. The measurement was 
repeated in each of the 37 directions at 10° inter¬ 
vals around the horizon, and for each of the three 
decoders being evaluated. The result is shown in 
Figure 5a. All three decoders provide a plausible 
ITD result. The significant differences occur at 
the sides. 

ILDs are considerably more complex than 
ITDs, with the major differences between the 
two ears occurring at frequencies above 1 kHz. 
As a simplification to make comparison easier, 
the ILD was calculated as an average level be¬ 
tween 1 to 4 kHz. As for the ITDs, ILD was 
calculated at 10° intervals around the horizon. 
The results are shown in Figure 5b. 

The three decoders produce substantially dif- 

5 https: / / ccrma.stanford.edu/room-guides/ 
listening-room 
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Figure 3: Balloon plots of all 16 spherical Slepian basis functions for the region —30° to 90° elevation on 
the sphere. Lobes with reversed polarity are shown in blue. Note that the first eight are concentrated in the 
upper hemisphere, the next two in the middle, and the last six in the lower hemisphere. The first 13 (A > 1) 
were used for the third-order decoder we tested. 



(a) re vs. Test Direction (b) r£ Direction Error (degrees) (c) Energy Gain (dB) 


Figure 4: The Spherical Slepian function decoder’s performance. These show the (a) energy concentration, 
(b) directional accuracy, and (c) loudness of sources from various directions. Directional errors are clipped at 
10 °. 


ferent values of ILD for sounds coming from the 
sides. It should be noted that the high values of 
ILD come from cancellation of signals on the op¬ 
posite side of the head from the sound source by 
diffraction of sound traveling around the head. 

Because the results of the ITD, and partic¬ 
ularly the ILD measurements, are so complex 
the analysis of their effect is quite difficult and 
beyond the scope of the present paper. That 
analysis will be published in a subsequent pa¬ 
per. 

4 Listening tests 

We conducted informal (non-blind) listening 
tests of third-order, single-band max-re A11RAD 
and SSF-based decoders using the 15 loudspeak¬ 
ers comprising the upper hemispherical dome in 
the Listening Room at Stanford’s CCRMA. The 
decoders computed by the toolbox were saved 
as AmbDec configuration files and loaded into 
multiple instances of AmbDec so that rapid com¬ 


parisons could be made. 

As a reference, we also listened to full- 
sphere playback of the test material over all 
22 loudspeakers in the Listening Room using 
the third-order, two-band, decoder described in 
the previous paper [Heller, Benjamin, and Lee 
2012]. Playback levels of all three decoders were 
matched by ear. 

The test material comprised two third-order 
recordings, a full-sphere mix by Jay Kadis, 
CCRMA’s audio engineer, of “Babel” by Allette 
Brooks 6 and Jorn Nettingmeier’s recording of 
Chroma XII by Rebecca Sanders [Nettingsmeier 
2012]. Playback was directly from the Ardour 
sessions for each piece, which gave us the capa¬ 
bility to move individual elements of the mix spa¬ 
tially to test performance from a wider variety 
of directions, as well as solo individual tracks. 

In general, both decoders sounded quite good, 
providing compact and directionally accurate 

6 http: / / www.cdbaby.com/cd/allette4 
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azimuth (degrees) 



(a) 250 Hz ITD (b) 1 to 4 kHz ILD 

Figure 5: Interaural time difference (ITD) and interaural level difference (ILD) as a function of azimuth for 
full-sphere, A11RAD, and SSF-based decoders. Source elevation is 0°. 


imaging down to the horizontal limit of the play¬ 
back array. Sources below the horizon were re¬ 
produced at the horizon, fading out as they were 
panned towards the nadir. The SSF-based deco¬ 
der sounded brighter and more detailed than the 
A11RAD decoder, despite the fact that neither 
decoder used frequency-dependent decoding. It 
was also noted that with the A11RAD decoder as 
the listener leaned to the left and right, central 
sources moved in the opposite direction, whereas 
with the SSF-based decoder central sources re¬ 
mained in place. 

Neither of the test decoders sounded as good 
as the reference dual-band, full-sphere decoder, 
especially in the reproduction of lower frequency 
percussion, which lost some of its impact. This 
may be attributable to the use of correct low- 
frequency velocity decoding (ry = 1) in the ref¬ 
erence decoder vs. wideband max-re decoding 
in the test decoders. 

At the end of the listening session, we used a 
first-order SSF-based decoder to briefly audition 
a first-order Soundfield microphone recording of 
an orchestra made by one of the authors. 7 In this 
case, the instrumental balance of the orchestra 
was incorrect; notably, the woodwinds were al¬ 
most inaudible. After the listening session, we re¬ 
called that in this recording, the microphone was 
hung vertically, approximately 3 meters behind 
and 1.5 meters above the conductor’s head, plac¬ 
ing the entire orchestra in the lower hemisphere 


' Beethoven: Sym. No. 4 in B-flat Major, Op. 60, 4th 
Mvt. Available at http://www.ambisonia.com/Members/ 
ajh/ambisonicfile. 2008-10-30.6980317146 


of the recording. The first-order SSF-based deco¬ 
der starts fading sources at approximately 20° 
above the horizon, which caused the instruments 
at the front of the orchestra to be attenuated 
significantly. At this point, we cannot recom¬ 
mend this configuration for first-order program 
material with significant sources in the lower- 
hemisphere. Possible workarounds we intend to 
try include inverting the vertical signal, Z, to 
mirror the soundfield across the Z = 0 plane or 
rotating the soundfield about the K-axis (“tilt”) 
in order to move important sources to the upper 
hemisphere. 

A11RAD decoders generated by toolbox have 
been used for performances at Stanford’s Bing 
Concert Hall and Studio employing CCRMA’s 
24-speaker, hemispherical dome, loudspeaker ar¬ 
ray. At the dress rehearsal for a performance in 
the Concert Hall, we were able to compare the 
new A11RAD decoder to the projection decoder 
that had been used for previous concerts. The 
improvement was clearly audible to all present, 
with increased clarity and directional focus, espe¬ 
cially for sources behind and above the audience. 

Good results have also been reported using 
modified inversion for a second-order decoder for 
a 12-speaker trirectangle array that is limited by 
the ceiling height of the room, leaving a large 
gap in coverage at the top and bottom of the 
array. 

5 Decoding Engine 

To support operation beyond third-order, a vari¬ 
ety of plug-in architectures, and use with third- 
party SDKs, a new Ambisonic decoder engine 
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was implemented in Faust. Faust is a DSP 
specification language, which can target a vari¬ 
ety of plug-in formats and operating systems. 

The new implementation comprises about 250 
lines of FAUST. It has no inherent limits on 
the Ambisonic order at which it operates and 
supports three modes of decoding: one decod¬ 
ing matrix with per-order gains (T), one decod¬ 
ing matrix with phase-matched shelf filters, and 
dual-band, with phased-matched bandsplitting 
filters and two decoding matrices. The outputs 
can be delay and level compensated for speak¬ 
ers at different distances from the center of the 
array. 

Nearfield compensation is supplied by digital 
state-variable realizations of Bessel filters [Smith 
2013b] and can be applied at the input or output 
of the decoder, or turned off completely. The 
current implementation provides filters for op¬ 
eration up to fifth-order, although the toolbox 
includes facilities for automatically generating 
filters up to approximately 25th order. 8 

User adjustments are supplied for overall gain 
and muting, as well as crossover frequency and 
relative levels of high and low frequencies. All 
realtime controls are “dezippered” and can be 
accessed directly through GUI elements or via 
Open Sound Control. 

In practice, the toolbox writes out the con¬ 
figuration section of the decoder and appends 
the implementation section, producing a sin¬ 
gle FAUST “dsp” hie, containing the full deco¬ 
der. The FAUST compiler (either online or lo¬ 
cal) is used to produce a highly optimized C++ 
class that implements the decoder, which is then 
wrapped in a plug-in-specific architecture hie 
that provides the interface to the various SDKs. 
This is compiled to produce the plug-in hie. At 
the time of this writing VST, AU, MaxMSP, Pd, 
LADSPA, LV2, Supercollider, and many others 
are supported on Windows, MacOSX, and Linux. 
In addition, an online compiler is available. 

The decoder engine implementation can be 
used apart from the toolbox by editing the config¬ 
uration options and inserting the per-order gains 
and matrix coefficients manually. Facilities are 
provided to generate configuration sections di¬ 
rectly from existing AmbDec configuration hies. 

6 Channel-Order, Normalization, 
and Mixed-Order Conventions 

At present, there are a number of channel-order 
and normalization conventions in use by the 

8 The limit is imposed by Matlab’s roots () function. 


Ambisonics community. The toolbox imple¬ 
ments all conventions known to the authors, in¬ 
cluding variants that adjust the gain of the om- 
nidirectiontal component (W) to be compatible 
with B format. Internally, each channel is anno¬ 
tated with its degree, order, gain relative to full 
orthonormalization (N3D), and Condon-Shortly 
phase, so additional conventions can be added 
easily, if needed. 

Two mixed-order conventions are supported by 
the toolbox: the scheme used in the AMB Ambi¬ 
sonic File Format (#H#P) [Dobson 2012] and one 
proposed by Travis [2009], which gives resolution- 
versus-elevation curves that are hatter in and 
near the horizontal plane (#H#V). 

7 Conclusions and Future Work 

We have reported on extensions to the Ambisonic 
Decoder Toolbox to handle popular loudspeaker 
configurations that do not cover the full sphere, 
such as hemispherical domes and multilevel rings. 
It also has been extended to operate at higher 
Ambisonic orders and with alternate channel or¬ 
der and normalization conventions. To support 
that, and multiple plug-in architectures, we have 
written a new, full-featured decoder in FAUST. 

In general, the ability to generate decoders 
quickly has proven valuable in performance set¬ 
tings where one has to set up quickly and the 
speakers are not necessarily installed in the 
planned locations. The other effect is that it 
places less emphasis on performance prediction 
in that a number of decoders can be generated 
with different methods and parameter settings, 
and then auditioned to determine the best one 
for a particular set of playback conditions. 

Generating dual-band decoders from these al¬ 
ternate methods is an obvious extension for the 
toolbox, as is using the decoders as initial esti¬ 
mates for the optimizer. Users have requested 
adding bass management to the decoder imple¬ 
mentation. We have also investigated hosting 
the toolbox on a server and linking directly to 
the online FAUST compiler, so that a user does 
not need to install any software to use it. 

As highlighted at the end of our listening ses¬ 
sion, a significant open question with partial- 
coverage decoders is what should happen if a 
source moves into a “poor” area, for example, 
the zenith or nadir directions. The effect of a 
Spitfire flying low overhead is probably not com¬ 
promised if it appears too loud or doesn’t have 
exact localization. Conversely, a source moving 
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underground may be allowed to fade. 9 

The current implementations simply discard 
these sources, fading out as they are panned be¬ 
yond the coverage region. In the case of the 
A11RAD decoders, they can be brought out for 
further processing by simply making the imag¬ 
inary speakers into real speakers in the config¬ 
uration hie; however, these signals cannot be 
simply mixed into existing speaker feeds as the 
coherent combination of the signals will distort 
the directional fidelity of the decoder, especially 
for sources near the horizon. One proposal is to 
decorrelate them using a broadband 90° phase 
shift and sum into the speaker feeds. Other sug¬ 
gestions are welcome. 

The toolbox is open source and available under 
the GNU Affero General Public License, version 
3. The FAUST code generated by the toolbox 
is covered by the BSD 3-Clause License, so that 
it may be combined with other code without re¬ 
striction. Contact the authors to obtain a copy 
of the toolbox. 
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Abstract 

Everyday situations are rich in numerous acoustic 
events emerging from different origins. Such acoustic 
scenes may comprise discussions of our fellow human 
beings, chirping birds, cars, cyclists, and many more. 
So far, no recording or scene analysis technique for 
this rich and dynamically changing acoustic environ¬ 
ment exists, though it would be needed in order to 
document or actively shape an acoustic scene. We 
know customised techniques for recording symphony 
orchestras with a static cast, but none that auto¬ 
matically readjusts to scenes with varying content. 
Thus, a new recording technique that analyses the 
signal content, the position and the activity of all 
sources in a scene, is required. We present WiLMA , 
a wireless large scale microphone array, a mobile in¬ 
frastructure that allows for investigating into new 
recording and analysis techniques. 

Keywords 

network audio, sensor network, microphone, dis¬ 
tributed processing 

1 Introduction 

Traditionally, the sensor nodes of a wireless sen¬ 
sor network (WSN) that captures sound events, 
are populated with low quality microphones, 
amplifiers and analogue to digital converters 
(ADCs) in order to decrease sensor node size, 
power consumption and cost. 

The Wireless large-scale microphone array 
(WiLMA) introduces high quality audio pro¬ 
cessing in wireless sensor networks. Each of the 
sixteen sensor modules (SM) allows for captur¬ 
ing of up to four high-end microphone signals 
which in turn enables the use of a 4-channel 
microphone array (e.g. first order tetrahedral 
ambisonics microphone) per SM. Thus, the sys¬ 
tem operates as a large scale microphone array, 
with a total of 64 audio channels. A single SM 
and the used microphone array are depicted in 
fig-1. 

The acquired data from all SMs is transmit¬ 
ted (either wireless or wired) to a central unit 



Figure 1: Sensor module and microphone array 
(Oktava 4D-ambient) 

(CU) running the host application shown in fig.6 
and fig.7. This host application visualises input 
levels, synchronisation and battery status. Fur¬ 
ther, it allows the user to individually configure 
each SM for a specific task. 

Each SM is equipped with a local processing 
unit in order to perform computations on the 
acquired data. Instead of sending the raw data 
to the the central unit responsible for the fusion, 
sensor modules can use their processing abilities 
to locally carry out simple computations and 
transmit only the required and partially pro¬ 
cessed data. This intelligent sensor network ap¬ 
proach results in decreased network traffic and 
higher ffexibility of the system. 



Figure 2: Acoustic scene analysis 

An example application using the WiLMA 
hardware is to separate sources of an acous¬ 
tic scene and track their movement. Thus, 
it should be possible to analyse the separated 
source signals and to assign a specific event to 
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a specific source. Fig.2 conceptually depicts the 
process of a spatial transcription. Areas that 
could benefit from that application include as¬ 
sisted living scenarios, acoustical planning, the 
surveillance of urban areas, multichannel source 
separation, event detection, source tracking and 
so on. Another application is the high audio 
quality multichannel recording of an acoustic 
scene with the added benefit of flexible micro¬ 
phone positioning due to wireless operation of 
the system. 

2 Design 

The basic design of the sensor network contains 
a Central Unit and a variable number of Sensor 
Modules. 



♦ Clock (Radio, wired) 

► Control (Ethernet) 

-^ Stream (Ethernet) 

Figure 3: Network diagram of multiple synced 
Sensor Modules and a Central Unit 

The Central Unit controls and monitors the 
individual modules, The Sensor Modules cap¬ 
ture audio autonomously and send their data 
to the Central Unit, where it can be collected 
for further processing. 

To allow for sample synchronous audio cap¬ 


turing, all SMs are connected to a central mas¬ 
ter clock. 

2.1 Modes of Operation 

We can distinguish between three different 
modes of operation for each sensor unit: 

2.1.1 Recording 

The simplest operational mode is to record 
the microphone signals locally on the SM. 
The recording should be time-stamped, so the 
recording of multiple SMs can be time-aligned 
later in an offline process. 

2.1.2 Streaming 

For recording and monitoring purposes, it might 
often be desirable to not collect the audio data 
decentralised on the SMs and collect them later, 
but rather have all audio channels available im¬ 
mediately at the Central Unit, by means of real¬ 
time streaming. This allows the sensor network 
to be used as a de-centralised capture-only mul¬ 
tichannel sound card. 

2.1.3 Processing 

Each SM is also equipped with a local process¬ 
ing unit that can be used to do (simple) analysis 
of the local signals, parallelising the computa¬ 
tional load. 

The actual processing algorithm might 
change depending on the application. It is 
therefore required to be able to implement algo¬ 
rithms in a reasonable environment and deploy 
these programs easily on all (or selected) SMs. 

The result could be either an enhanced sig¬ 
nal, meta-data about the signal or a mixture of 
both (e.g. using signal identification on the 4 
channel recording, it is possible to only stream 
a mono-version of the signal together with po¬ 
sitional meta data). 

2.1.4 Mixed 

Multiple connected SMs need not operate in the 
same mode. For instance, some SMs could be 
streaming audio, whereas other SMs would only 
do processing and send meta-data to the Cen¬ 
tral Unit (as depicted in Fig.3). 

2.2 Communication 

All control communication between the CU and 
the SMs is based on a bi-directional OSC- 
connection. Typical OSC-applications use UDP 
as transport protocol, which behaves badly in 
congested networks. In order to work around 
reliability issues, the transport layer can be 
configured to either use UDP or TCP/IP with 
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Figure 5: Block diagram of the sensor module 


SLIP-based packetizing as suggested by the 
OSC-1.1 specifications [1]. 

Besides configuring and activating the var¬ 
ious modes of operation, the “control chan¬ 
nel” includes basic infrastructure (like sending 
and receiving heartbeats in order to determine 
whether the connection is still established (in 
the case of UDP) and the SM is still respon¬ 
sive) and health information (e.g. CPU load, 
memory and disk usage, battery status, sync 
status, microphone levels). It also allows to 
configure the SM (e.g. setting the gain of the 
microphone preamplifier) and transports the en¬ 
tire meta-information extracted by any optional 
processing on the SM. 

Audio streaming from the SM towards the 
CU is not done via OSC (as suggested e.g. 
by [2]), but instead uses the more widespread 
RTP protocol [3] on top of UDP. The RTP- 
timestamps are synchronised, in order to be able 
to re-align the audio signals of multiple SMs. 

3 Sensor Module 

The Sensor Module (see Fig.l) consists of a cus¬ 
tom hardware design running Linux. 

3.1 Audio 

The 4 channel analogue front end is equipped 
with THAT1570 low noise, differential micro¬ 
phone preamplifiers which are digitally con¬ 
trolled via SPI using THAT5173 controller ICs. 
Analogue to digital conversion is performed by 
an AD1974, a 4 channel, 24 bit ADC with inte¬ 
grated phase-locked loop (PLL). 

3.2 Synchronisation 

The internal sampling clock of the AD1974 is 
derived from the word clock provided by the 


synchronisation module. Wireless synchronisa¬ 
tion within the WiLMA system is established 
via a 1 pulse-per-second timestamp signal that 
is broadcasted by the master module on a sub- 
GHz ISM band. The synchronisation module 
is populated with a voltage controlled oscilla¬ 
tor (VCXO) that is disciplined by a frequency 
locked loop (FLL) and a subsequent frequency 
divider to obtain the 48 kHz word clock for the 
ADC. The sample accurate timestamps gener¬ 
ated by the synchronisation module is multi¬ 
plexed with the output data of the ADC into 
a 8-channel/32 bit time-division multiplexing 
(TDM) stream. 

3.3 System On Chip 

The heart of each sensor module is a Beagle- 
bone A6 equipped with an ARM Cortex A8 
based processor running Linux. The TDM au¬ 
dio stream is read by an ALSA driver that sets 
up the ADC, controls the microphone preampli¬ 
fiers and accesses the Multichannel Audio Serial 
Port (McASP) via the DaVinci ASoC driver. 

3.4 Power Supply 

The power module generates supply voltages for 
the different modules from the wall plug sup¬ 
ply or the battery, respectively. It also gen¬ 
erates an optional 48V supply voltage for mi¬ 
crophones requiring phantom power. The LiPo 
battery pack is connected to a battery manage¬ 
ment system which is responsible for controlling 
charge voltage and charge current, switching be¬ 
tween power sources and providing information 
about the battery status via I2C bus. In case of 
battery undervoltage the battery management 
system autonomously disconnects the load from 
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the battery to keep the battery in a safe state. 

3.5 Software 

Each SM is running on Ubuntu-11.10 (Oneiric 
Ocelot), using the standard armel architecture 
packages, with the notable exception of the ker¬ 
nel, which is a customised build of linux-3.2.30 
due to the required ALSA drivers of the custom 
sound card. 

When the system starts up, a control program 
- the WILMAsm daemon - is started. This dae¬ 
mon monitors the various health states of the 
system and runs an OSC-server for communi¬ 
cation with the CU. The service is announced 
via ZeroConf/Avahi [4], using the type specifier 
_wilma-sm. _udp (resp. _wilma-sm. _tcp). 

Since the daemon is implemented in Python , 
a more appropriate sub-system for running the 
audio-related tasks is needed. This subsystem 
has been implemented using Pure Data , as it 
is a well known environment and allows to de¬ 
ploy algorithm implementations in a text-based 
form (thus reducing the need to cross-compile 
binaries for the target ARM platform). 

In order to integrate nicely with the frame¬ 
work, any processing unit needs to adhere a sim¬ 
ple standard, which defines inlets/outlets of the 
Pd- patch and the filesystem layout. 

The used implementation of Pd is a slightly 
modified version of Pd-0.44-2. The main mod¬ 
ification has been a customisation towards the 
special audio layout of the SM, which provides 
an eight channel audio interface, where only the 
first four channels contain actual audio data 
(as sampled from the microphones), and the 
remaining four channels contain a 32bit times¬ 
tamp synchronised on all SMs. 1 

Pd is running as a sub-process of the control- 
daemon, which monitors the audio process and 
restarts it in the unlikely event of a crash. The 
control daemon and the audio process communi¬ 
cate via a bi-directional OSC connection on top 
of UDP. (No TCP/IP option is given here, as 
the connection is only running on localhost). 

4 Central Unit 

The Central Unit is an off-the-shelf Linux sys¬ 
tem eventually equipped with a MADI audio 

1 Obviously this makes the timestamp encoded in a 
highly redundant way. The main reason for this redun¬ 
dancy is that the AD 1974 allows to easily copy a sin¬ 
gle 32bit auxiliary digital data word into four channels 
at once. Since the channels 5 to 8 are unused anyhow, 
no immediate drawback arises from this redundant data 
handling. 


interface (in order to play back the independent 
audio streams from 16 SMs), and is running the 
audio stream aggregator and control application 
WILMix. 



Figure 6: WILMix overview over available SMs 

The control application provides a user- 
interface for controlling and monitoring the var¬ 
ious aspects of the SMs, like starting audio 
streaming, distributing process-patches or col¬ 
lecting recordings. 



Figure 7: WILMix controlling a specific SM 

The application uses ZeroConf to detect all 
available SMs in the local network, and con¬ 
structs a mixer application for the given number 
of channels. 

The audio stream aggregator receives the 
RTP-streams from the various SMs, and re¬ 
aligns them in time, so that they can be played 
back sample synchronously. 

As is with the SMs, the control part of the 
application is implemented in Python , whereas 
the audio processing part is written in Pd, both 
communicating via OSC over UDP. 

5 Discussion 

While the current software implementation 
works as a proof of concept, there are certainly 
things to improve. 

For one thing, the use of Pure Data on an 
ARM Cortex A8 is suboptinral, as the processor 
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lacks an FPU, whereas Pd does all processing on 
floating point samples. 

Implementations using alternative frame¬ 
works that would allow for fix-point arithmetic 
(such as GStreamer[ 5]) were initially planned 
but were soon discarded in order to avoid 
cross-compilation environments altogether. (A 
major issue when the potential algorithm im¬ 
plemented are matlab-spoilt, C-agnostic stu¬ 
dents). 

Even with Pd as the audio engine, it might be 
advisable to use it’s library incarnation libpd[ 6] 
rather than a full-fledged Pd , as it would greatly 
simply the communication between the control 
application and the audio engine. Using libpd , 
it should even be possible to get rid of the mod¬ 
ifications currently needed to obtain the 32bit 
timestamps from the audio channels 2 . 

6 Availability 

The source code for the WiLMA-Application 
(running on both the SMs and the CU) has been 
released under the GNU GPL, and is available 
for download from github 3 . 

The hardware has beed designed in-house at 
the Institute of Electronic Music and Acoustics. 
However, the schematics have not yet been pub¬ 
lished under an open license. 

7 Conclusions 

The WiLMA hardware introduces high quality 
audio processing in wireless sensor networks. 
The overall system comprises 16 sensor mod¬ 
ules that allow for recording up to 64 audio 
channels. Audio signals in the frequency range 
between 20Hz and 20kHz are converted with a 
high quality ADC (24bit). The information of 
each sensing module is collected by a central 
unit, that combines the individual data to a fi¬ 
nal outcome. Data transmission between the 
SMs and a central unit can either be wireless 
(WLAN) or wired (Ethernet). The capsules of 
the used microphone arrays ( Oktava fD) obey 
a linear frequency response (no sound coloura¬ 
tion) and a minimal gain mismatch between 
capsules. Furthermore, the system offers a run¬ 
time of up to 8 hours in battery-powered mode. 
Thus, its mobile and flexible use is ensured. 

In order to allow for the application of algo¬ 
rithms of the acoustic field theory, the audio 

2 The timestamps cannot be read directly in patch- 
space, as Pd does not provide a 32bit integer type - 
all numbers are equal... and they are (single precision!) 
floats. 

J https://github.com/iem-projects/WILMAmix/ 


streams of different SMs are synchronised with 

an accuracy of one sample (~ 20ps). 
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Abstract 

We introduce the vsti-poly. cpp architecture for 
the Faust programming language. It provides sev¬ 
eral features that are important for practical use of 
FAUST-generated VSTi synthesizers. We focus on 
the VST architecture as one that has been used tra¬ 
ditionally and is supported by many popular tools, 
and add several important features: polyphony, note 
history and pitch-bend support. These features take 
FAUST-generated VST instruments a step forward in 
terms of generating plugins that could be used in 
Digital Audio Workstations (DAW) for real-world 
music production. 

Keywords 

Faust, VST, Plugin, DAW 

1 Introduction 

FAUST [5] is a popular music/audio signal pro¬ 
cessing language developed by Yann Orlarey et 
al. at GRAME, 1 with contributions from a com¬ 
munity of developers. The FAUST toolset en¬ 
ables the generation of standalone synthesizers 
as well as plugins for various operating systems 
and environments. Considering FAUST a conve¬ 
nient tool and a fast way for prototyping and 
even creating production level sound effects and 
synthesizers, we would like to use FAUST in com¬ 
bination with real-world music production tools 
and DAWs (Digital Audio Workstations). 

We believe it is necessary to facilitate work¬ 
ing with tools such as Cubase, Ableton or other 
DAWs providing a similar level of user experi¬ 
ence and features. In the past ten years those 
tools shifted from relying on built-in PC sound- 
blaster or external MIDI-controlled modules to 
a plugin based architecture. Plugins are used 
to generate sound and apply audio effects. Sev¬ 
eral common plugin architectures exist: VST, 
Apple’s Audio Unit (AU), LV2 (the successor 
of LADSPA and DSSI under Linux OS). The 

1 http://faust.grame.fr 


VST (Virtual Studio Technology) plugin stan¬ 
dard was released by Steinberg GmbH (famous 
for Cubase and other music and sound produc¬ 
tion products) in 1996, and was followed by the 
widespread version 2.0 in 1999 [8]. It is a partic¬ 
ularly common format supported by many older 
and newer tools. 

Some of the features expected from a VST 
plugin can be found in the VST SDK code. 2 
Examining the list of MIDI events [1] can also 
hint at what capabilities are expected to be im¬ 
plemented by instrument plugins. We also draw 
from our experience with MIDI instruments and 
commercial VST plugins in order to formulate 
sound feature requirements. 

In order for FAUST to be a practical tool for 
generating such plugins, it should support most 
of the features expected, such as the following: 

• Responding to MIDI keyboard events 

• Polyphony 

• Portamento 

• Pitch-bending (wheel controlled) 

• Arpeggio 

• Other effects dependent on note occurrence 
history 

All of the plugin formats mentioned above 
can be generated from FAUST code with varying 
levels of feature support. For example, there 
is a very complete faust21v2 shell-script dis¬ 
tributed with Faust provided by Albert Graf 
[3]. There is also a highly useful faust2au 
script by Reza Payami that is still under devel¬ 
opment. Useful VST 2.4 plugins can be gener¬ 
ated using the faust2vst script, and relatively 
limited VSTi plugins (i.e., VST synthesizer or 
“instrument” plugins) can be generated using 
faust2vsti. Initial VSTi support was limited 

Specifically in the PlugCanDos namespace, declared 
in audioeffectx.cpp (in VST 2.4 SDK) 
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a single voice (implemented in the FAUST archi¬ 
tecture hie vsti-mono. cpp). 

This paper describes the VSTi support 
implemented in the FAUST architecture hie 
vsti-poly. cpp. 3 This effort adds polyphony 
support, pitch-bend, note-history, and other fea¬ 
tures described below. Pitch-bend and note his¬ 
tory support facilitates effects such as porta¬ 
mento slide, 4 and creating arpeggiators. Finally, 
we provide an example of how it can be used 
to create instruments. We demonstrate using 
Faust- generated VST plugins with MuLab [4] 
and Renoise [7] workstations. We also discuss 
possible future improvements and additions. 

Related work 

For handling MIDI events and polyphony sup¬ 
port in a Faust architecture hie, we bene¬ 
fited from the MIDI plugin section of [3] and 
the Faust DSSI architecture-hle source code 
dssi.cpp. Additionally, vsti-mono. cpp was 
useful as a basis for our extended Faust VSTi 
architecture. 

2 Design 

Following the convention introduced by Albert 
Graf for faust2pd [2] and faust21v2 [3] et 
al. [6], the VST architecture hie implements 
functionality for recognizing the “freq”, “gate” 
and “gain” FAUST-control labels to set the note 
and velocity upon MIDI Note-On events (0x90) 
and to set the gate to 0 for a MIDI Note-Off 
event (0x80). One approach to implementing 
polyphony for the VSTi architecture is doing it 
similarly to the DSSI plugin architecture. The 
“freq”, “gate” and “gain” are mapped to the con¬ 
trols multiple times which enables playing si¬ 
multaneously a predehned maximum number of 
notes. 

We combine the approaches taken in 
vsti-mono.cpp and dssi.cpp. Figure 1 
shows a UML diagram describing our design 
(vsti-poly. cpp). A VST host interacts with 
the VST plugin through the AudioEffectX 
interface. The Faust class dehnes the func¬ 
tionality of the plugin by implementing that 
interface. The mydsp class performs the signal 
processing and synthesis- it is the code that is 
actually produced by the FAUST compiler. We 
instantiate mydsp for each voice (Voice class). 

3 It is expected that this name will later change to 
vsti.cpp. The faust2vsti command-line script will of 
course be updated as well in that case. 

4 Although for a monophonic synthesizer portamento 
can be implemented by smoothing the input frequency. 


The VST plugin controls are created and up¬ 
dated using the vstUI class. There is an in¬ 
stance of vstUI held by the Faust class which is 
used for knobs and sliders controlled by the user 
via the graphical interface or by mapping MIDI 
controls. This instance is for controlling param¬ 
eters that are global and should affect every note 
played. The instances of vstUI that are created 
as part of each Voice instance are for control¬ 
ling per note parameters (frequency, gain, pre¬ 
viously played frequency and gate). The Faust 
class implementation of the setParameter inter¬ 
face method is broadcasting any change in the 
global plugin parameter to all Voice instances. 

Handling MIDI events 

Faust VSTi architecture handles MIDI events 
delegated by the VST host. The host sends the 
events to the plugin by calling processEvents. 
An event of type kVstMidiType indicates a 
MIDI event. 

Note On 

A MIDI note-on event (status byte is 0x9) re¬ 
sults in searching for a free voice instance to 
handle the new note in the f reeVoices list con¬ 
tained in the Faust class. The search proceeds 
in a classic round robin pattern as found in hard¬ 
ware synthesizers. If a free voice is found, the 
voice is designated as the new voice, otherwise 
the oldest playing voice is stolen and designated 
as the new voice. Its frequency is set according 
to the note number, the gain parameter is set 
according to the note velocity, and the gate is 
set to 1. An entry is added to playingVoices, 
mapping the note to the voice index, and the 
voice index is removed from the f reeVoices list. 
The previously played note is saved in order to 
enable the portamento slide. 

The VST format operates with multiple sam¬ 
ples in a processing block. The note-on event in¬ 
cludes a sample offset within the current block. 
These deltas are stored in a list so that multi¬ 
ple note-on events can be handled in the block. 
The note to voice allocation occurs within the 
processing loop, so that each note starts at its 
correct sample position within the block. 

Note Off 

A MIDI note-off event (status byte is 0x8) re¬ 
sults in searching for the corresponding Voice 
instance in the playingVoices list contained in 
the Faust class. The gate is then set to 0. Be¬ 
cause the voice may have a release tail after the 
gate is zeroed, a silence detection algorithm is 
used to determine when the voice index should 
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Figure 1: Faust VSTi design 


be added to the f reeVoices list. The voice out¬ 
put must be below the silence threshold for an 
entire block before it is marked as free. Silence 
detection allows sounding voices to not be re¬ 
allocated prematurely and also provides better 
CPU efficiency compared to always processing 
all voices. Like note-on events, note-off events 
are sample accurate within a block. 

Pitch Bend 

A MIDI pitch bend is indicated by status byte 
OxE. The MIDI event pitch argument has values 
in the range 0..16384. We normalize it to be in 


the range -1..1 and broadcast the value to all 
voices thus affecting all currently playing notes. 
The frequency is not updated by the architec¬ 
ture, as it is the responsibility of the FAUST code 
to use the pitchbend control value. This sepa¬ 
ration enables the user to ignore or handle the 
pitch-bend MIDI event according to the desired 
behavior. 

All-notes-off Event 

The All-notes-off MIDI event is indicated by a 
note number of 0, and velocity 0. Like the sin¬ 
gle note-off event, the voice gate is set to 0 and 


121 













entered into the release silence detection state. 
This is done for all active voices. 

Portamento Slide Implementation 

We demonstrated the very common portamento 
slide effect by creating a Faust VSTi based 
on the sawtooth synthesizer that is part of the 
Faust oscillator library (oscillator. lib). We 
added a portamento control that can take val¬ 
ues in the range 0.01..0.3. The portamento ef¬ 
fect is achieved by mixing two exponentials, one 
decaying and one reaching saturation with char¬ 
acteristic time t that is equal to the value of the 
portamento control. 

fmixed = fnew 6 r-SR^j fp rev ■ e t-SR 

where SR is the sampling frequency, t is the 
time that has passed since the new note was 
played and f new and f pre v are the new and pre¬ 
viously played frequencies, respectively. This in¬ 
strument also supports pitch bending controlled 
by the pitch-bend wheel. The f new is actually a 
sum of note frequency and the value of the pitch- 
bend control (in the range -1..1) multiplied by 
20. The demo synthesizer source code is pre¬ 
sented in Alg. 1. 

A demonstration of music production using 
FAUST can be found at http://stanford.edu/ 
~yanm2/music/faustloop.mp3. 

This short loop was produced using only 
Faust- generated VSTi plugins, with the excep¬ 
tion of the drums. 



Figure 2: VST plugin generated by Faust as 
it appears in MuLab. Using predefined con¬ 
trol names “freq”, “gain”, “gate”, “prevfreq” and 
“pitchbend” automatically maps the controls to 
MIDI event parameters. 

3 Installation and Basic Usage 

Basic installation instructions are provided in 
[3]. If you are using an up-to-date ver¬ 
sion of FAUST you should already have the 
vsti-poly. cpp architecture file, and running 


X 


Yan Michalevsky: sawtooth synth 



Figure 3: VST plugin generated by Faust as it 
appears in Renoise tracker. 

make install 

should make faust2vsti tool accessible from 
any directory. Running 

faust2vsti <yourfaustcode.dsp> 

will create a VST effect or synthesizer from the 
.dsp file. To produce only the source code (.cpp) 
run 


faust -a vsti-poly.cpp 
-o <output filename> 

<yourfaustcode.dsp> 

vsti-poly. cpp currently supports both VST 
audio processing plugins and VSTi-MIDI-driven 
software synthesizer plugins. In the future we 
expect to consolidate all the VST related archi¬ 
tecture files under the FAUST project. 

4 Future Work 

In this section we briefly offer suggestions for 
future development, based on our observations 
during this project. 

Inherent portamento slide support 

Portamento-slide is common to many synthe¬ 
sizers, for which reason it may be a good idea 
to incorporate the support for it into the ar¬ 
chitecture file. This effect requires a gradual 
change of frequency that can be performed by 
vsti-poly. cpp. The speed of transition to the 
new frequency could be determined by a “por¬ 
tamento” control as is done with other controls 
recognized by the architecture. 

Inherent pitch-bend support 

Pitch-bending is also common to many synthe¬ 
sizers and requires a change of frequency. This 
change in frequency can be done by the archi¬ 
tecture prior to calling mydsp: : compute”. 

5 This of course requires the synth to use a “freq” con¬ 
trol and not only note identifier as we suggest in the 
next paragraph. It would also require a way to specify 
the bending range. 
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Algorithm 1 sawtooth-synth: sawtooth with portamento and pitch-bend in Faust 
declare name "Sawtooth-Synth"; 

import (" musicdib"); 
import (" oscillator .lib"); 

gate = button("gate"); 

gain = hslider("gain[unit:dB][style:knob]", -10, -30, +10, 0.1) : db21inear : smooth(0.999); 
freq = nentry("freq[unit:Hz]", 440, 20, 20000, 1); 
prevfreq = nentry("prevfreq[unit:Hz]", 440, 20, 20000, 1); 

portamento = vslider("[5] Portamento [unit:sec] [style:knob] [tooltip: Portamento (frequency-glide) 
time-constant in seconds]", 0.1,0.01,0.3,0.001); 
pitchbend = vslider("pitchbend", 0, -1, 1, 0.01); 

start_time = latch(freq-freq’, time); 

dt = time - start_time; 

expo(tau) = exp(0-dt/(tau*SR)); 

mix(tau, f, pf) = f*(l - expo(tau)) + pf*expo(tau); 

bended_freq — freq + pitchbend * 20; 

sfreq = mix(portamento, bended_freq, prevfreq) : min(20000) : max(20); 

x = sawtooth (sfreq : smooth(0.999)); 
process — x * gain * (gate); 


Setting note identifier control in addition 
to frequency 

Currently the pitch is set by a “freq” control, 
used by the FAUST code to determine the fre¬ 
quency. The “freq” control value is set by the 
architecture according to the note identifier re¬ 
ceived in the MIDI Note-On event. Sometimes 
it is more useful to have the note identifier or 
piano key identifier. For instance, there are ex¬ 
isting FAUST synthesizers that take the key as 
input. A percussion synthesizer that produces 
a different sound for every key would possibly 
use a key identifier instead of note frequency. It 
would be therefore a welcome addition to the 
vsti-poly architecture to set the value of a note 
identifier control on each Note-On event. 

Extended note history 

We currently save only the previously played 
frequency enabling the implementation of the 
portamento-slide. Synthesizers that produce 
chords or arpeggios may require information 
about more previously played notes. This would 
be enabled by extending the saved note his¬ 
tory. Passing these values to the Faust code 
would require instantiation of multiple note or 
frequency controls. 

Single Faust VST architecture file 

Currently there are several FAUST architecture 
hies related to VST: vst.cpp, vst2p4.cpp, 


vsti-mono. cpp and vsti-poly. cpp. While 
theses have been kept side-by-side to not inter¬ 
fere with other users during development of each 
new architecture hie, they are redundant and 
should be consolidated into a single vsti.cpp 
architecture hie. 

Shared signals among multiple voices 

Many synthesizers offer modulation sources that 
affect multiple voices simultaneously. For exam¬ 
ple, an LFO can modulate pitch or waveform on 
all voices in a polyphonic synth. In the future it 
would be beneficial if shared signal support was 
provided to the synthesizer designer. 

Enhanced GUI support 

Other architectures within the Faust ecosystem 
have more features in their GUI layout capabil¬ 
ities. The grouping of controls into subsections 
and providing specification of knobs vs. sliders 
would provide better hexibility and organization 
comparable to hand coded VSTi plugins. 

Further Host-Plugin integration 

One simple yet useful feature to implement is 
the Bypass capability, enabling the user to turn 
oh a plugin from the host. 

More information provided to the plugin by 
the host includes time and tempo. This can be 
useful for implementing arpeggio instruments, 
or audio effects dependent on tempo, such as 
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gating or synchronized echo. 

Consolidation of various VST related 
architecture files 

At the time of writing, FAUST code con¬ 
tains multiple architecture variants pertain¬ 
ing to VST: vst.cpp and vst2p4.cpp for ef¬ 
fects, vsti-mono. cpp for monophonic instru¬ 
ments and vsti-poly. cpp, introduced by this 
work, supporting effects, polyphonic instru¬ 
ments and other features discussed in the pa¬ 
per. We suggest there should be one archi¬ 
tecture encorporating all mentioned functional¬ 
ity. Meanwhile, since portamento is very rele¬ 
vant to monophonic instruments, we added the 
necessary modifications to support the effect in 
vsti-mono. cpp, as well as support for pitch- 
bend. 

5 Conclusion 

We presented the vsti-poly. cpp FAUST archi¬ 
tecture file and its new features: polyphony, 
pitch-bend and note-history. We used these 
features in the implementation of a polyphonic 
sawtooth synthesizer with pitch-bend and por¬ 
tamento slide support, and demonstrated it in a 
short musical loop, recorded in a popular DAW. 
We also suggest ideas for further development 
of VSTi support in Faust which will contribute 
to easier implementation of common synthesizer 
features. The ideas presented here are not lim¬ 
ited to the VSTi architecture but could also 
serve as a reference for implementing FAUST ar¬ 
chitectures for other plugin formats. 

References 

MIDI Manufacturers Association. MIDI 
messages. 

http://www.midi.org/techspecs/- 
midimessages.php. 

Albert Graf. Interfacing Pure Data with 
FAUST. In Proc. 5th Int. Linux Audio Conf. 
(LAC-07), TU Berlin, 

http://uiww. kgw. tu-berlin. de/~lac2007/- 
proceedings. shtml, 2007. 
http://www.grame.fr/ressources/- 
publications/lac07.pdf. 

Albert Graf. Creating LV2 plugins with 
Faust. In Proc. 11th Int. Linux Audio Conf. 
(LAC-13), Graz, 

http://lac.linuxaudio.org/, 2013. 
http://wiki.faust-lv2.googlecode.com/- 
hg/faust-lv2-lacl3-full.pdf. 


MuTools. Mulab, http://www.mutools.com/. 
http://www.mutools.com/- 
mulab-product.html. 

Yann Orlarey, Dominique Fober, and 
Stephane Letz. Faust: an efficient functional 
approach to DSP programming. New 
Computational Paradigms for Computer 
Music , 2009. 

Yann Orlarey, Albert Graf, and Stefan 
Kersten. DSP programming with FAUST, Q 
and SuperCollider. In Proc. fth Int. 

Linux Audio Conf. (LAC-06), ZKM Karlsruhe, 

http://lac.zkm.de/2006/proceedings. shtml, 
pages 39-40, 2006. http://lac.zkm.de/- 
2006/proceedings.shtml#orlarey_et_al. 

Renoise. Renoise, http://www.renoise.com. 

Wikipedia. Virtual studio technology. 


124 



Latency Performance for Real-Time Audio on BeagleBone Black 


James William TOPLISS Victor ZAPPI Andrew McPHERSON 

Centre for Digital Music, School of EECS 
Queen Mary University of London 
Mile End Road 
El 4NS London, 

United Kingdom, 

j.w.topliss@selO.qmul.ac.uk victor.zappi@qmul.ac.uk a.mcpherson@qmul.ac.uk 


Abstract 

In this paper we present a set of tests aimed at 
evaluating the responsiveness of a BeagleBone Black 
board in real-time interactive audio applications. 
The default Angstrom Linux distribution was tested 
without modifying the underlying kernel. Latency 
measurements and audio quality were compared 
across the combination of different audio interfaces 
and audio synthesis models. Data analysis shows 
that the board is generally characterised by a re¬ 
markably high responsiveness; most of the tested 
configurations are affected by less than 7ms of la¬ 
tency and under-run activity proved to be contained 
using the correct optimisation techniques. 

Keywords 

Embedded systems, BeagleBone Black, responsive¬ 
ness, latency, real-time. 

1 Introduction 

Research in Music Technology and, in partic¬ 
ular, on Digital Musical Instruments (DMIs) 
is strongly connected to the field of Human- 
Computer Interaction (HCI). Following the 
trend of many other disciplines involving HCI, 
like Ubiquitous Computing [Kranz et al., 2009] 
and Augmented Reality [Langlotz et al., 2012; 
Ellsworth and Johnson, 2013], DMI research 
has recently started capitalising on portable and 
embedded systems rather than on general pur¬ 
pose architectures. After many years of com¬ 
plete synergy, musical instruments are increas¬ 
ingly abandoning the laptop/desktop computer 
in favour of onboard audio processing, leaving 
an important mark in both academia [Berdahl 
et al., 2013; Oh et al., 2010; Baclawski and 
Jackowski, 2013] and industry (e.g., Aleph 1 , 
ToneCore DSP 2 and OWL 3 ). 

This is due to the fact that DMIs require a 
specific set of design features to provide the user 
(i.e., a performer, a composer, a casual player) 

1 http://monome.org/aleph/ 

J http://line6.com/tcddk/ 

J http://hoxtonowl.com/ 


with a musical experience not too far from the 
one typical of acoustic and electric instruments 
[Berdahl and Ju, 2011]. This natural compari¬ 
son with well known “devices”, such as piano 
and guitar, underlines qualities like reconfig¬ 
urability, independence/autonomy and high re¬ 
sponsiveness, which can be assured only on a 
dedicated system. 

As designers and developers of open source 
novel DMIs, we have decided to explore the 
promising and evolving world of embedded 
Linux technologies, focusing as starting point on 
the concept of responsiveness. The work here 
presented shows the result of a series of tests 
aimed at measuring the latency of a Beagle¬ 
Bone Black 4 board (BBB), used as the core of a 
self-contained, open-source musical instrument. 
Different hardware and software configurations 
based on the same Linux kernel (v3.8.13) have 
been analysed under different CPU loads and 
levels of code optimisation. 

This work is part of a larger structured 
study, whose goal is to assess longevity, usabil¬ 
ity and reconfigurability of DMIs, compared to 
the standards of acoustic and electric musical 
instruments. 

2 Related Work 

In 2011 Berdahl et al. presented the Satellite 
CCRMA [Berdahl and Ju, 2011], a platform for 
teaching and practicing interaction design for 
diverse musical applications, completely based 
on embedded Linux. It runs on a BeagleBoard 5 
coupled with an Arduino Nano 6 and a bread¬ 
board, to support the use of sensors and ac¬ 
tuators. Two years later, Berdahl et al. up¬ 
graded the platform enabling the compatibil¬ 
ity with more powerful boards, such as Rasp- 

4 http : //beagleboard. org/products/beaglebone"/, 
20black 

’http://beagleboard.org/Products/BeagleBoard 

6 http://arduino.cc/en/Main/arduinoBoardNano 
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berry Pi' and BeagleBoard-xM 8 , and expanded 
the range of possible applications including net¬ 
working capabilities and hardware-accelerated 
graphics [s[Berdahl et al., 2013]. The project is 
based on a Fedora distribution with a custom 
low latency kernel. 

A good example of how new generation em¬ 
bedded Linux boards can be used to extend the 
capabilities of a musical instrument has been 
recently presented by MacConnell et al. [Mac- 
Connell et al., 2013]. This work introduces 
a BBB-based open framework for autonomous 
music computing eschewing the use of the lap¬ 
top on stage. Some important features of em¬ 
bedded systems are here used to provide the in¬ 
struments designed using the framework with 
high degrees of autonomy and reconfigurabil¬ 
ity. Authors include also data regarding the 
latency of the system running Ubuntu 12.04. 
Since mainly FX-like processes are addressed, 
only the audio-throughput time is measured, 
under different system load configurations. Re¬ 
sults vary between 10 to 15 ms, according to the 
kind of filtering. 

The necessity of measuring the responsive¬ 
ness of computer-based systems is not recent 
at all, especially in the context of real-time op¬ 
erative systems. In 2002, Abeni et al. used 
a series of micro-benchmarks to identify major 
sources of latency in the Linux kernel [Abeni 
et al., 2002], They also evaluated its effects 
on a time-sensitive application, in particular an 
audio/video player. Moving towards computer- 
based audio systems, it is worth mentioning the 
work by Wright et al. [Wright et al., 2004], in 
which the latency of MacOS, Red Hat Linux 
(with real-time kernel patches) and Windows 
XP are compared, both in an audio-throughput 
configuration and in an event audio-based con¬ 
figuration. The technique used to estimate 
the event audio latency consists of measuring 
the time delay between the sound produced by 
pressing a button on the keyboard and a sinu¬ 
soidal audio output triggered on the computer 
by pressing the button itself. 

3 System Configuration 

The tests presented throughout this work aim 
at the evaluation of the capabilities of a BBB- 
based system when used as development plat¬ 
form for DMIs. The chosen configuration in- 

'http://www.raspberrypi.org/ 

s http://beagleboard.org/Products/ 
BeagleBoard-xM 


eludes most of the standard components re¬ 
quired to synthesise and control audio in real¬ 
time on a self-contained instrument (i.e., with¬ 
out the need of laptops and any additional ex¬ 
ternal devices). Details on each of these com¬ 
ponents are given in the following subsections. 

3.1 Board and OS 

The BBB is an embedded Linux board based 
on a 1GHz ARM Cortex-A8 processor. It is 
shipped with an embedded Angstrom Linux dis¬ 
tribution (v3.8), optimised to run on embedded 
architectures. This distro is meant to run gen¬ 
eral purpose applications and it is not specif¬ 
ically audio-oriented. Our first intent was to 
explore the capabilities of this default board 
configuration, without introducing any changes 
in the underlying kernel. We believe this ap¬ 
proach could be very useful for the community 
of embedded developers to have a clear outline 
of the built-in audio capabilities of the BBB, 
thus helping choose the right board. 

Although belonging to a new generation of 
compact and fully accessorised boards (e.g., 
HDMI, uSD card slot), the BBB does not na¬ 
tively provide any audio interfacing. To test 
the performances of this board in relation to 
high quality audio synthesis, two commercially 
available audio interfaces were chosen for com¬ 
parison; these were both configured for use with 
the board so as to provide real-time audio out¬ 
put. 



Figure 1: The USB interface and the Audio 
Cape attached to the BeagleBone Black. 

3.2 Audio Interfaces 

We tested one USB interface and one Beagle- 
Bone expansion “cape” providing audio output. 
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This choice aimed at comparing the two most 
common solutions used by BBB users for gen¬ 
erating audio output. Figure 1 shows both the 
interfaces attached to the board. 

The first interface used is the Turtle Beach 
Amigo II USB Interface 9 ’This device is bus- 
powered and USB 2.0 class-compliant.. Once 
attached, this device is automatically recog¬ 
nised as a new hardware interface, meaning that 
it simply requires being specified as the selected 
device for audio applications. This interface 
provides 2 stereo 3.5mm jack receptors, one for 
input the other for output. Only the latter has 
been used for our tests. 

The second interface used for our tests is 
the BeagleBone Audio Cape 10 ; this device effec¬ 
tively acts as an extension to the BBB, it simply 
attaches to the top of the board to provide an 
audio interface. Audio data are exchanged to 
and from the BBB using an I2S connection. Un¬ 
like the USB interface, this device requires some 
manual configuration to be recognised as a plug¬ 
in hardware interface. The on-board HDMI au¬ 
dio virtual cape must be disabled so that the 
Audio Cape can be loaded by the firmware as 
the main audio device; this can be easily done 
by changing the uBoot parameters passed at 
boot-time. As the USB interface, this cape in¬ 
cludes a couple of stereo 3.5mm input/output 
jack receptors. 

3.3 Audio Synthesis 

Two different audio backend systems were de¬ 
veloped in CTT and cross-compiled to run on 
the ARM Cortex-A8 processor, one based on 
ALSA, the other based on JACK. ALSA and 
JACK implementations are currently adopted 
by a large number of Linux audio developers. 

The audio backend system based on ALSA 
(Advanced Linux Sound Architecture 11 ) essen¬ 
tially comprises an audio engine and a para¬ 
metric synthesizer. The synthesizer produces 
frame data; it is connected to the audio engine, 
which is responsible for collecting and trans¬ 
porting this frame data to the selected output 
device. 

The audio backend system based on JACK 
(Jack Audio Connection Kit 12 ) was similarly 
designed. A fundamental difference between 
these two APIs is that JACK uses a client-server 

9 www.turtlebeach.com 

10 http://elinux.org/CircuitCo:Audio_Cape_RevA 

n www. alsa. opensrc . org 

"www.j ackaudio.org 


model between operating processes and output 
devices. For this reason, only a synthesizer class 
was designed to operate as the client process, 
while a standard JACK server acts as the audio 
engine for transport. In this configuration the 
server pulls audio from the client process every 
time it requires new output data, this is in stark 
contrast to the ALSA system whereby audio is 
pushed to the output devices. 

Concerning audio synthesis, both the synthe¬ 
sizers implemented in ALSA and JACK gener¬ 
ate simple sine waves based on reading from 
a wavetable. Both systems are configured to 
provide CD quality audio, (i.e., 16bit resolu¬ 
tion, 44.1KHz sample rate) and to run the au¬ 
dio thread at the maximum priority level using 
a real-time FIFO scheduling. 

A parallel control thread was included to 
manage user input through the keyboard and to 
have access to the general-purpose in/out pins 
(GPIO) read/write capabilities of the BBB. 

4 Performance Test 

The responsiveness and the audio quality of 
four different specific configurations were tested, 
combining the use of the 2 audio backends 
(ALSA an JACK) with the 2 audio devices 
(USB and Audio Cape). Responsiveness was 
evaluated considering the latency occurring be¬ 
tween the triggering of an audio task produc¬ 
ing a waveform and the actual output of the 
waveform through the audio interface; the as¬ 
sessment of audio quality was connected to the 
incidence of under-runs. 

The performances of each configuration were 
measured running the audio task in 3 distinct 
test scenarios. Each of these scenarios (de¬ 
scribed in the following subsection) involved 
testing different period and buffer size config¬ 
urations. As the focus of the test is concerned 
with very low latency, only the smallest possi¬ 
ble period and buffer sizes were examined. In 
addition, each measurement was repeated en¬ 
abling 3 different optimisation settings on the 
CTT cross-compiler (Linaro GCC 4.7 hosted on 
a x86_64 architecture), using the 01, 02 and 03 
flags. 

4.1 Test Scenarios 

The first scenario involved the generation of a 
simple monophonic tone. As mentioned in Sec¬ 
tion 3.2, a simple lookup table was used to gen¬ 
erate the frames for this tone. 

The second scenario consisted of creating the 
same monophonic tone as used in the first sce- 
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nario, however whilst a Top background pro¬ 
cess was active with a fast refresh rate (passing 
the command line argument “-d 0.1”) (but with 
standard priority). This scenario allowed for the 
efficiency of audio synthesis to be observed and 
measured whilst the system was under heavier 
load. 

The final test scenario was concerned with 
the generation of a more complex polyphonic 
tone; this was achieved through the summation 
of three harmonically related monophonic oscil¬ 
lators. The addition of these two extra tones 
was considered to be a suitably harder task to 
synthesise than the simple monophonic tone. It 
must be noted that no background process was 
executed during this scenario. 



Figure 2: The setup to measure latency when 
using the USB interface. 

4.2 Procedure 

The BBB was connected by USB; tests were 
performed over an ssh connection via BBB’s 
USB network connection. One of the board’s 
GPIOs was attached to the first input channel 
of an oscilloscope. The audio output was con¬ 
nected to the second channel of the oscilloscope. 
For the case of the USB interface, the complete 
setup is shown in Figure 2. 

In detail, the test procedure ran as follows. 
Upon starting one of the executables, the gener¬ 
ated system (ALSA or JACK) was programmed 
to initialise itself but then wait for user input 
(i.e. a keystroke) before beginning to fill the 
output buffers with frames. Once the keystroke 
signal was received across the serial connection, 
the first task of the system was to drive the 
GPIO connected to the oscilloscope from low to 
high. Only immediately after this the audio cy¬ 


cle could begin, outputting the signal into the 
oscilloscope. The oscilloscope was set to trigger 
a single display capture on both the channels 
upon the detection of a rising edge in the GPIO 
signal. The time distance between the GPIO 
rising edge in the display and the beginning 
of the captured audio output hence provided a 
measurement of the operational latency (Figure 
3); each measurement was repeated 5 times [as¬ 
suming that’s right] and an average value cal¬ 
culated. 


Figure 3: Examples of oscilloscope display cap¬ 
ture for scenarios 1 and 3. Latency is measured 
as the horizontal distance (time gap) between 
the GPIO signal rising edge (in yellow) and the 
start of the waveform (in light blue). 

It must be noted that the period and buffer 
sizes chosen were dependent upon both the sys¬ 
tem type and the target interface; configura¬ 
tions that worked well for one pair did not nec¬ 
essarily run well for another. The six smallest 
usable configurations were tested for each sys¬ 
tem and interface pair. 

In addition to measuring the output latency, 
the quality of the output audio was also ob¬ 
served; this observation relied upon noting the 
frequency of frame dropout (under-run activity) 
and visible distortion displayed on the oscillo¬ 
scope (if any). 

5 Results 

The reported measurements are here presented 
and discussed, first globally and then analysing 
more specific cases. Both latency and quality of 
the output (under-runs) are taken into account 
and the singular contributions are combined. 
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5.1 Latency 

The latency results were highly consistent 
across trials: when using the USB audio in¬ 
terface with both ALSA and JACK systems 
(Fgures 4 and 6) the maximum difference be¬ 
tween individual measurements generally only 
varied by one millisecond, with only few excep¬ 
tions; this was true regardless of the used opti¬ 
misation. Measurements concerning the Audio 
Cape (Figures 5 and 7) were even more consis¬ 
tent than for the USB interface. Across the five 
measured latencies, measurements only varied 
by half of a millisecond, without exceptions. 

As expected, for all configurations, latency is 
directly related to buffer and period sizes. No 
significant, systematic latency differences were 
noted amongst the three optimisation settings. 

However, the choice of monophonic versus 
polyphonic synthesis and the system load in¬ 
troduce unexpected variations in the latency. 
These variations may reflect a delay in start¬ 
ing up the ALSA or JACK system, rather than 
a difference in steady-state latency once the au¬ 
dio rendering is running. Our test procedure 
toggles a GPIO pin and then immediately be¬ 
gins filling the audio buffers; it is possible that 
this initial startup produces an additional tran¬ 
sient delay compared to reacting to an event 
once audio is already running. In any case, the 
difference between test conditions is always less 
than 1ms. 
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Figure 4: ALSA latency measurements for the 
USB interface. 

ALSA - Audio Cape 
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Figure 5: ALSA latency measurements for the 
Audio Cape. 

It can be noted that, on both systems, the 
Audio Cape allows for smaller buffer and period 
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Figure 6: JACK latency measurements for the 
USB interface. 
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Figure 7: JACK latency measurements for the 
Audio Cape. 

configurations. Unfortunately, the set of avail¬ 
able configurations varies between the 2 sys¬ 
tems, making impossible a direct performance 
comparison. For the only overlapping configu¬ 
ration (buffer 512 - period 32), JACK performed 
unexpectedly badly, showing a higher latency 
than configurations based on larger period sizes. 

Conversely, USB interface results extend on 
the same set of configurations for both ALSA 
and JACK, so that a quantitative comparison 
is here possible. For the smallest period size 
(i.e. 64 frames) the JACK system shows bet¬ 
ter or equal performances, while increasing the 
size ALSA proved remarkably more responsive. 
In particular, the last 3 cases listed in Figure 6 
shows that JACK’s latency is almost the same 
and quite high, regardless of all the conditions, 
i.e. buffer/period sizes, test scenario and opti¬ 
misation level. 

5.2 Under-runs 

The test highlighted a certain incidence of 
under-runs, whose effects varied according to 
the chosen configuration. Generally, they oc¬ 
curred in particular when lowest buffer and pe¬ 
riod sizes were tested. Also the different opti¬ 
misation settings proved to strongly affect their 
incidence. 

5.2.1 ALSA 

Using ALSA on the USB interface, it was ob¬ 
served the occurrence of frame dropout issue 
only when the buffer and period were set to 
the minimum values (i.e. respectively 128 and 8 
frames). This was true across all the build qual- 
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ities of the system and for all of the scenarios. 
During the first scenario (pure tone) and third 
scenario (polyphonic tone) observed dropouts 
were not too severe, normally only being exhib¬ 
ited once or twice at the beginning of synthesis. 
The second scenario seemed to generate signif¬ 
icantly higher rates of frame dropout, leading 
to audible clicks. An interesting observation is 
that the 02 optimised versions of the system 
appeared to always exhibit the least amount of 
under-runs. 

In the case of the ALSA system using the Au¬ 
dio Cape, it was observed that frame dropout 
only occurred whilst using the smallest period 
size configurations (period size of 8), regardless 
of the buffer setting. Again this was true across 
all the build qualities of the system and all of the 
scenarios. The first and third scenario produced 
a very small amount of under-run activity, in¬ 
terestingly only for the first period and buffer 
size configuration tested. Similarly to the USB 
interface, these observed dropouts were very mi¬ 
nor, normally only being exhibited once or twice 
at the beginning of synthesis. In the second 
scenario the amount of frame dropout did in¬ 
crease slightly. Interestingly, this time the 03 
optimised versions exhibited the best improve¬ 
ments, almost completely preventing all under¬ 
runs. 

5.2.2 JACK 

Since in JACK the stream to the audio device is 
not managed by the client, under-runs can occur 
only not he server. In relation to the JACK sys¬ 
tem using the USB audio interface, most frame 
dropout issues observed occurred during the 
first size configurations (buffer 128 - period 16), 
the second one (buffer 128 - period 32) and the 
fourth one (buffer 256 - period 32). In regards to 
the first and third scenarios, the frame dropouts 
noted for the first period and buffer size con¬ 
figuration were very severe for the 01 and 02 
optimisations. The amount of under-run activ¬ 
ity for this configuration made it very difficult 
to gather any measurements for latency; some¬ 
times the JACK server would under-run con¬ 
tinuously without the client even being active. 
The 03 optimisation however did not experi¬ 
ence this problem for this configuration; under¬ 
runs were noted however were nowhere near as 
severe. In the case of the second scenario, far 
less dropouts were observed consistently across 
all build qualities, a surprising result. Again, 
03 optimisation proved to provide the best per¬ 
formance enhancement. 


In the case of the JACK system using the 
Audio Cape, it was noticed that the occurrence 
of frame dropout appeared more frequently for 
the first three period and buffer size config¬ 
urations. The first scenario produced a very 
small amount of under-run activity, in regards 
to all the three optimisations; only during the 
first scenario were any frame dropouts observed. 
The nature of these under-runs however was dif¬ 
ferent to those previously observed; during the 
synthesis the server ran smoothly, while under¬ 
runs were noted only after the client had been 
disconnected. It was observed during the sec¬ 
ond scenario that the amount of frame dropout 
increased slightly; the type of under-run seen in 
the first scenario (after the termination of the 
client) occurred more frequently. This behav¬ 
ior was exhibited in both the first and second 
period and buffer size configurations for the 01 
and 03 optimisations. In the case of the 02 op¬ 
timisation, this behavior was not observed; in¬ 
stead a severe under-run issue occurred during 
the second period and buffer size configurations 
whereby the server immediately began to under- 
run before the client had even been launched. 
In relation to the third scenario, similar types 
of under-run behavior were seen as in the previ¬ 
ous two scenarios whereby under-runs occurred 
after the termination of the client. No frame 
dropouts were observed for the 02 optimisation 
for this particular scenario. 

6 Conclusion 

Throughout this paper we presented a study 
aimed at evaluating the responsiveness of a 
Linux embedded system. As part of a larger 
study on DMIs design, we focused on test¬ 
ing latency and quality of audio output on a 
BeagleBone Black board running the standard 
Angstrom distribution with no kernel modifi¬ 
cations. Two different audio backend systems 
were taken into consideration, one based on 
ALSA, the other on JACK, and measurements 
using a USB audio interface and an Audio Cape 
were compared. The test monitored event-to- 
audio latency and included the monitoring of 
under-run activity. 

Data analysis showed for both ALSA and 
JACK audio systems remarkably low latency 
values, especially for small buffer and period 
size configurations. In particular, the use of the 
Audio Cape allows for latency values lower than 
3ms. In some of the different CPU scenarios 
taken into consideration the audio stream pre- 
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sented dropouts and clicks, especially when us¬ 
ing small buffer and period size configurations. 
However, the usage of the different levels of code 
optimisation available in the chosen compiler 
(cross-Gcc 01, 02 and 03) completely fixed the 
audio quality in most of the tested configura¬ 
tions. 

No previous works delved into the audio capa¬ 
bilities of the BeagleBone Black while running 
the default Linux distribution (Angstrom with 
kernel 3.8). Compared to other distributions, 
like Ubuntu or Fedora, the usage of Angstrom 
on the BeagleBone Black proved to support very 
low latency configurations without the need of 
a customised kernel. In the context of digital 
musical instrument design, this feature is re¬ 
markable and makes the BeagleBone Black an 
appealing platform for instrument development. 
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Abstract 

Mephisto is a small battery powered open source 
Arduino based device. Up to five sensors can be 
connected to it using simple 1/8” stereo audio jacks. 
The output of each sensor is digitized and converted 
to OSC messages that can be streamed on a WIFI 
network to control the parameters of any Faust gen¬ 
erated app. 
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1 Introduction 

In the past few years, the Faust 1 [Orlarey and 
Letz, 2002] programming language has been 
used increasingly by researchers and developers 
to implement new algorithms for real time audio 
signal processing. As a result, dozens of open 
source Faust effects and synthesizers are now 
freely available. For example, Julius Smith’s li¬ 
braries [Smith, 2012] and the Faust-STK [Mi- 
chon and Smith, 2011] provide a large array of 
objects ranging from the simplest lowpass filter 
to complex feedback delay networks and physi¬ 
cal models of musical instruments. 

However, we observed that these technologies 
remain relatively inaccessible to musicians who 
don’t have the knowledge (and the desire) to 
compile a Faust object on their laptop. In 
other words, these elements are not “plug and 
play”. 

One of the tool already at our disposal to fa¬ 
cilitate the sharing and the use of Faust objects 
is the Online Compiler 2 [Michon and Orlarey, 
2012]. This web app contains an interactive cat¬ 
alog of Faust programs that can be compiled 
to any of the available Faust architectures and 
then downloaded. Users can easily add their 
own Faust codes to the catalog or modify ex¬ 
isting elements. Even if this very high level tool 
makes the creation of plug-ins, etc., very easy, it 

Attp: //f aust. grame . fr. 

“http://faust.grame.fr/compiler. 


targets users who have some knowledge in com¬ 
puter music and who know how to use a VST 3 , 
an external audio interface, etc. 

Thus, to make things even easier we started 
to think about a Faust stomp box that could be 
based on an embedded Linux system such as a 
Raspberry Pi 4 . It would have been able to con¬ 
nect to the online compiler to provide its user a 
list of the objects stored in the catalog. A down¬ 
load button to cross compile and then download 
the effect or the synthesizer in the FAUST box 
would have made the use of this system very 
easy. 

However, even though Raspberry Pis are 
great prototyping platforms, their computation 
power is quite limited. Also, their booting 
time can be a problem for impatient musi¬ 
cians. We then realized that a smartphone or 
a tablet could do a similar job and would be 
more user friendly. While there already existed 
a faust2ios architecture, Apples product were 
presenting a huge disadvantage over Android 
phones: in order to be installed, an app has 
to be approved by the Apple Store which was 
making our concept of a customizable stomp 
box impossible to implement. Thus we opted 
for Android and created a f aust2android [Mi¬ 
chon, 2013] architecture. 

Another key component of our project was 
to provide an easy way for musicians to con¬ 
trol the different parameters of a Faust object 
during a live performance. While smartphones 
offer built-in basic controllers: touch screen, ac¬ 
celerometer, etc., these elements are not very 
practical to interact with if the user is playing 
an instrument and processing its sound with his 
phone. Indeed, many instruments require the 
use of both hands, making it inconvenient to 
interact with another interface. 

Mephisto was created to solve this problem. 

3 Virtual Studio Technology. 

4 http://www.raspberrypi.org/. 
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Figure 1: View of MEPHISTO from its top. 


It is a small battery powered device that can be 
easily attached on someone’s belt (cf., figures 1 
and 2). Up to five sensors can be connected to it 
using simple 1/8” stereo audio jack plugs. The 
output of each sensor is digitized and converted 
to OSC 5 [Wright, 2005] messages that can be 
streamed on a WIFI network to control the pa¬ 
rameters of any Faust generated app. As OSC 
is a standard protocol, Mephisto can be used 
with f aust2android apps but is also compat¬ 
ible with most of the Faust architectures and 
programs enabling OSC communication. 

As a “DIY” 6 open source project, Mephisto 
only uses open source hardware (Arduino, etc.) 
and was designed to be easily built by anyone. 
A web page giving the instructions to build your 
own Mephisto has been created 7 . 

2 Hardware 

Designing small scale open source hardware can 
be a rather complicated task. Indeed, while 
software can be easily deployed and shared, 
in many cases hardware requires a production 
chain, etc. For this reason, Mephisto has been 
designed to be easily and quickly built by any¬ 
one. 

2.1 The Case 

To make it as easy as possible for users, the case 
of Mephisto is 3D printable. It has been de¬ 
signed with Blender 8 which is an open source 
program for 3D design that has some CAD fea¬ 
tures. 

5 Open Sound Control: http: //opensoundcontrol. 
org/. 

6 Do It Yourself. 

'http://ccrma.Stanford.edu/~rmichon/mephisto. 

s http://www.blender.org/. 



Figure 2: Use example of Mephisto. 



Figure 3: 3D model of Mephisto’s case as it 
appears in Blender. 

We’re perfectly aware that not everyone has a 
3D printer at home, but 3D printed models can 
now be ordered very easily online for very cheap. 
Moreover, Mephisto’s case has a very simple 
design and can be printed on almost every 3D 
printers. 

2.2 Electronics 

Mephisto is based on an Arduino Uno 9 and 
a WIFI Shield 10 (cf., figure 4). The Arduino 
provides five analog inputs that are used in 

!, http: //arduino. cc/en/Main/arduinoBoardUno. 
10 http://arduino.cc/en/Main/ArduinoWiFiShield. 
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Figure 4: An Arduino Uno and its WIFI shield. 



Figure 6: Example of sensors that can be con¬ 
nected to Mephisto with their 1/8” jack plugs. 


Interface Program 



USB 



Figure 5: Mephisto flow chart. 


Mephisto to digitize the output signals of the 
sensors. Simple 1/8” stereo audio jacks (three 
pins) are used to bring power to the sensors and 
to retrieve their output signal (cf., figure 1). 

Users can configure basic parameters such as 
the WIFI network to connect to or the IP ad¬ 
dress of the host using an LCD screen and a 
navigation button interface. 

Mephisto can be powered with any DC 
power adapter between seven and nine volts 
or with a simple nine volts battery. We con¬ 
sidered using lithium ion batteries instead but 
these are more expensive and need a special 
charger. With five simple sensors plugged to 
it, Mephisto can run for about four hours on 
the same nine volts battery. Moreover, it is very 
easy and quick to replace it. 


Dozens of sensors have been tested and can be 
easily prepared to work with Mephisto. Our 
website explains how to set up an accelerometer, 
a pressure sensitive and a flex sensors, trim pots, 
etc. 11 . 

3 Software 

3.1 Arduino Firmware 

The Arduino firmware carries out a large num¬ 
ber of tasks. It retrieves and digitizes the output 
signals of the sensors and scale them in function 
of the parameters specified in the interface pro¬ 
gram (cf., §3.2). Then it converts them to OSC 
messages using oscuino 12 . The OSC address of 
each sensor can be configured in the interface 
program (cf., §3.2). 

The firmware also handles the user interface 
implemented through the LCD screen and the 
navigation buttons. 

3.2 Interface JAVA Program 

Even though Mephisto provides its own very 
simple interface to configure it by the mean of 
its LCD screen and navigation buttons, a JAVA 
program 13 that can run on both Linux and Ma- 
cOSX was created to carry out this task more 
precisely. 

This simple program allows to configure the 
OSC address and the range of the OSC messages 
sent by Mephisto for each jack input. It is also 
possible to choose which sensor is connected to 
which jack in order to carry out some scaling on 
their output signal (even if the Mephisto web¬ 
site explains how to do basic electronic scaling 

ri http://ccrma.Stanford.edu/~rmichon/mephisto. 
“http://cnmat.berkeley.edu/oscuino. 

13 https://github.com/rmichon/mephisto/. 
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OSC Address 0: |foscO 
OSC Address 1: /oscl 
OSC Address 2: /osc2 
OSC Address 3: /osc3 
OSC Address 4: /osc4 


Min 

Min 

Min 

Min 

Min 


Max: 100.0 Sensor: jAccelerometerX 


Max: 100.0 Sensor: [AccelerometerY -yj 
Max: 100.0 Sensor: lAccelerometerZ !▼! 


Max: 100.0 Sensor: Knob 


Max: 100.0 Sensor: [Knob 
Network Name: CCRMA Network Password: none 

Host Address: 192.168.178.237 Update Rate: 20 
Upload! 

Mephisto W 1^/ 


Figure 7: Screenshot of the interface program 
used to configure Mephisto from a desktop. 
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on sensors, it is often necessary to adjust their 
output range computationally). 

The Mephisto configuration program also 
makes it possible to pre-configure the WIFI net¬ 
work to which mephisto will connect as well as 
its password if it is protected, the IP address of 
the host and the rate at which the messages are 
sent. 

This interface program formats and cus¬ 
tomizes the Arduino firmware in function of the 
provided parameters. It then compiles it and 
uploads it to the Arduino if it is connected to 
one of the USB port using ino 14 . As this pro¬ 
gram only works with Linux and MaxOSX it 
makes the interface only usable on these plat¬ 
forms even though it can also be executed on 
Windows. 

4 Conclusion 

Mephisto is an open source project that im¬ 
proves and simplifies the control of sound ef¬ 
fects and synthesizers running on a mobile de¬ 
vice. Any kind of sensor can be connected to it 
and used as on OSC controller for live perfor¬ 
mance. 

The Faust online compiler together with 
Mephisto, f aust2android and the Faust cat¬ 
alog of sound effects and synthesizers greatly 
simplifies the use of Faust objects by musi¬ 
cians. 

The use of 3D printing in the framework of 
open source hardware projects makes things a 
lot easier than in the past. Indeed users don’t 
need to have any background in manufacturing 
and only have to take care of putting the differ¬ 
ent components together. 

Similarly, Arduinos are relatively self con¬ 
tained environments that significantly reduce 
the size of electronic circuits making projects 
like Mephisto easy to build at home. 
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Abstract 

The historical origin of currently used programming 
models for doing real-time computer music is exam¬ 
ined, with an eye toward a critical re-thinking given 
today’s computing environment, which is much dif¬ 
ferent from what prevailed when some major de¬ 
sign decisions were made. In particular, why are 
we tempted to use a process or thread model? We 
can provide no simple answer, despite their wide use 
in real-time software. 

Keywords 
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1 Introduction 

The language of real-time computer music bor¬ 
rows from three antecedents that were fairly 
well in place by 1985, before the field of real¬ 
time computer music took its current form. 
Classical computer music models, starting with 
Max Mathews’s MUSIC program (1957), were 
well studied by that time. The field of computer 
science, particularly operating system design, 
was also taking shape; perhaps it may be said 
to have matured by 1980 with the widespread 
adoption of Unix. Meanwhile, a loosely con¬ 
nected network of electronic music studios arose 
in the 1950s whose design is directly reflected in 
the patching paradigm that is nearly universal 
in modern computer music environments. 

Both computer science and music practice re¬ 
lied on a notion of parallelism, albeit in very 
different forms and terms. In computer science, 
abstractions such as process and thread arose 
from the desire to allocate computing resources 
efficiently to users. In music, thousand-year old 
terms like voice and instrument imply paral¬ 
lelism, both on written scores as multi-part mu¬ 
sic were notated for both practical and dogmatic 
reasons, and in real time as live performers sang 
or played the music in ensembles. 

In both computer science and computer music 
language, abstractions modeled on processes or 


INPUTS CPU OUTPUTS 

state 



Figure 1: Submitting jobs to a computer circa 
1960. 

threads are used to try to describe the passage 
of time and also to express, and/or take advan¬ 
tage of, parallelism. But the aims in computer 
science (efficiency) are different from those in 
computer music (as an aid to organizing musi¬ 
cal computation). 

In the sections that follow I will try to trace 
these developments historically to see why we 
treat processes and related concepts in the way 
that we now do in real-time computer music sys¬ 
tems. I hope to help clarify why the current 
practice is what it is, and perhaps contribute 
to thinking about future computer music pro¬ 
gramming environments.. 

2 Computer science terminology 

In classical operating system design theory, the 
tasks set before a computer were organized into 
jobs. A prototypical computer (fig. 1) sat in a 
room waiting for jobs to be submitted to it, per- 
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running 


Oops, need 
some info 

idle 



MESSAGE SHARED 

PASSING MEMORY 



process 1 process 2 


Figure 3: Process intercommunication using 
messages and shared memory. 


Figure 2: Interactive jobs may stall in mid¬ 
computation to ask the operator for more in¬ 
formation. 

haps in the form of stacks of punched cards. The 
computer would execute each job in turn, hope¬ 
fully producing output which could also have 
been stacks of punched cards. At least three 
problems arise in this scenario: 

• Idleness. The computer sometimes had 
nothing to do and would be idle; idle time 
reduced the total amount of computation 
the computer could carry out. 

• Latency. Sometimes a job would be sub¬ 
mitted while another job was running (as 
in job number 2 in the figure); in this case 
the job would join a queue of waiting jobs. 
This meant that the submitter of job 2 had 
to wait longer to harvest the output. 

• Unanticipated data needed. For many 
types of jobs you might not be able to pre¬ 
dict at the outset what data will be needed 
during the computation. The “job” model 
doesn’t offer a way for the computer to ask 
the operator for additional information it 
might need. 

The first two of these only impact the effi¬ 
ciency of computation, but the third requires 
that we go back and amend the job model al¬ 
together; so we will consider that first. Figure 
2 shows an amended model of computation al¬ 
lowing interactive jobs that may stop execution 
part way through and ask the operator for more 
information. When this happens the job is con¬ 
sidered stalled, and the computer sits idle. 

Computer science’s answer to the problems 
of idleness and latency have been to intro¬ 


duce time-sharing and multiprocessing. Time¬ 
sharing is the practice of keeping several jobs 
in progress at the same time, so that when one 
job stalls or finishes, the processor’s time can 
then be devoted to some other job that needs 
running. Perhaps this second job will later stall 
or finish but meanwhile, too, the first job may 
have become runnable again (having received a 
new dose of data it had stalled waiting for). The 
computer would then return to job 1. One could 
also fill idle time by keeping low-priority jobs 
waiting in the background (ones whose latency 
requirements were less strict) that would run 
whenever all higher-priority jobs were stalled. 

The advent of multiprocessors made it possi¬ 
ble to further improve throughput in the same 
way that having several short-order cooks in 
a diner can speed orders. As the number of 
jobs and the number of available processors in¬ 
creases, there should be fewer wild swings in 
the availability of processing power to satisfy 
the needs of submitted jobs. 

The chief tool for time-sharing and multipro¬ 
cessing is an abstraction called a process, which 
can be thought of as a virtual computer. When 
a job is submitted, one creates a brand new (vir¬ 
tual) computer to carry it out, and once the job 
is finished, the virtual computer, or process, is 
scrapped. Each job may run in ignorance of all 
other jobs on the system. Each process gets its 
own memory and program to run, and its own 
program counter, abbreviated PC, that records 
where in the program the computer is now run¬ 
ning. When the computer switches from run¬ 
ning one process to another one, the memory 
and PC (and other context) of the first pro¬ 
cess are retained so that they are available again 
when the first process is again run in the future. 

Although at the outset we could consider all 
processes to operate in complete ignorance of 
each other, at some point the need will certainly 
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arise for processes to intercommunicate. Com¬ 
puter science offers at least two paradigms that 
we will want to consider: message passing and 
shared memory (see fig. 3). Of these, the mes¬ 
sage passing paradigm is less general but eas¬ 
ier to analyze and make robust. In message 
passing, one process can simply send another 
a packet or a stream of data, that the second 
one may read at any later time. This is similar 
conceptually to how people intercommunicate. 
The chief difficulty using this paradigm is that it 
does not allow a process to interrogate another 
directly, except by sending a message and then 
stalling until a return message is received. This 
might greatly increase the latency of computa¬ 
tions, and worse yet, if we adopted this strategy 
for interrogation, two processes could conceiv¬ 
ably interrogate each other at the same time, so 
that both end up deadlocked. 

In the shared-memory paradigm two pro¬ 
cesses communicate by reading and writing to 
a shared area of memory. We can then arrange 
for one process to be able to interrogate another 
one simply by looking in the appropriate loca¬ 
tion in its memory (which, by prior arrange¬ 
ment, we had arranged to share). But now we 
have to work hard to make sure that our two 
processes will carry out their computations de¬ 
terministically, because the order in which the 
two access the shared memory is not controlled. 
We would need to set up some convention to 
manage this. (One such convention could be to 
format the shared memory into message queues, 
thus returning us to the situation described in 
the previous paragraph.) In general, there is no 
final answer here; any paradigm will either be 
onerously restrictive or dangerously permissive, 
or both, and to make good choices will require 
careful attention to the particulars of the task 
at hand. 

3 Electronic music terminology 

The first widely used model for computer music 
performance was what is now called Music N, 
developed over a series of programs written by 
Max Mathews starting in 1957[Mathews, 1969]; 
by 1959 his Music 3 program essentially put the 
idea in its modern form, as exemplified in Barry 
Vercoe’s Csound program. These programs all 
act as “music compilers” or “renderers”, taking 
a fixed text input and creating a soundfile as a 
batch output. Although Csound has provisions 
for using real-time inputs as part of its “render¬ 
ing” process, in essence the programming model 



Figure 4: The Music N paradigm 
is not interactive. 

Music N input is in the form of an orchestra 
and a score, as shown in fig. 4. The orchestra 
can be thought of as emulating a 1950s-era elek- 
tronischemusik studio, in which the hardware 
is organized in metal boxes with audio inputs 
and outputs, such as tape recorders, oscillators, 
pulse generators, filters, ring modulators, and so 
on. These would be connected by audio cables 
into a patch. Furthermore, the boxes had knobs 
and switches on them that allowed the user to 
supply parameters such as the frequency of an 
oscillator. 

In the Music N paradigm, the studio and its 
patch are represented by the orchestra. Al¬ 
though the actual orchestra file is in a program¬ 
ming language, when publishing algorithms it 
has traditionally been represented as a block 
diagram showing a collection of unit generators 
and the audio connections between them. 

The score is organized as a list of time-tagged 
records that are (either nostalgically or depre- 
catingly) called score cards. In addition to one 
or two time tags (a “note” has two, one for its 
start and one for its end), a score card has some 
number of numerical parameters that may be 
supplied to the unit generators. The score is like 
a process in that it runs sequentially in time. 
Unlike the computer science notion of a pro¬ 
cess, however, the score advances and waits ac¬ 
cording to timing information in the score cards. 
Each score card has an associated logical time 
at which it is run. 

Things get interesting when we try to adapt 
this paradigm to run in real time. We could 
simply connect the Music N output to a real¬ 
time audio output; but presumably our reason 
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for wanting to run in real time is to be able 
to use live inputs to affect the sound output. 
Taking the opposite direction, we could require 
that the user or musician supply all the param¬ 
eters in real time using knobs and switches, but 
this quickly reveals itself to be unmanageable 
for the human. We will need to make intel¬ 
ligent decisions, probably different for any two 
musical situations, as to how the live inputs will 
affect the production of sound. More generally, 
our problem is to design a software environment 
that will give a musician the freedom to make 
these choices. 

In the early 1980s two influential real-time 
synthesizers were designed, the Systems Con¬ 
cepts Digital Synthesizer (or “Samson Box”) 
at StanfordfLoy, 1981], and the 4C synthesizer 
at IRCAM[Moorer et ah, 1979][Abbott, 1981]. 
Both machines ran a fixed computation loop 
with a fixed number of steps, with one loop fin¬ 
ishing at each tick of the sample clock, 

Each of these machine designs got some 
things right for the first time. The Samson 
box was the first working machine that could 
do sample-accurate parameter updates in real 
time. To do this, the fixed program contained 
an update mechanism in which items were taken 
off the head of a time-tagged parameter update 
queue. This queue was filled by the Foonly con¬ 
trolling computer some tenths of seconds, or 
whole seconds, in advance, so that the Foonly 
did not have to preform parameter updates syn¬ 
chronously. This approach had one major lim¬ 
itation: it did not take into account the possi¬ 
bility of real-time interaction. It was physically 
possible to jump the queue for “real-time” pa¬ 
rameter updates, but then one lost any ability 
to determine the timing of such updates accu¬ 
rately. 

The 4C machine and its controlling software 
4CED were more explicitly designed with real¬ 
time interaction in mind, although the timing 
was less accurate than with the Samson Box. 
In the 4C parameter updates were effected at 
interrupt level from the controlling computer; 
the computer was interrupted by the 4C when 
one of a bank of timers ran out. 

The 4CED user conceptualized the 4C as a 
collection of 32 independent processes (Abbott’s 
simile was a collection of 32 music boxes that 
the user could start at any time). The guiding 
idea seems to have been that a performer could 
play a keyboard with 32 keys on it, but each key, 
rather than being restricted to playing a single 
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Figure 5: The 4C and the 4CED environment 

note, could in turn set off a whole sequence of 
actions. This would seem to greatly magnify 
what the keyboard player could do. 

It seems also to have been on people’s minds 
that the playing of sequences could usefully be 
unified with the business of scheduling break¬ 
points to a pitch or amplitude envelope consist¬ 
ing of many segments. Both the sequencing of 
collections of notes and the sequencing of en¬ 
velope breakpoints led many computer music 
researchers to think that a process model, as 
would appear a time-sharing operating system, 
was a perfect metaphor to reuse in the design 
of real-time computer music control systems. 

Both the Samson box and the 4C maintained 
lists of parameter updates that resemble Mu¬ 
sic N scores in that they have sequences of nu¬ 
meric updates for synthesis parameters. In the 
case of the 4C, the scheduling of the updates 
could depend on real-time inputs. Both these 
systems, but particularly 4CED, modeled musi¬ 
cal sequences in ways that resembled processes 
in the computer science sense. 

4 Processes in modern computer 
music environments 

The four most widely-used computer music en¬ 
vironments are probably Csound, Max, Super¬ 
collider, and Pd. (Since this is a linux confer¬ 
ence, we won’t consider Max here, but only the 
closely related Pd.) Of all these, Supercollider 
is unique in that it explicitly adopts a process¬ 
like model, which offers at least two advantages. 
First, it allows the user to “think” in processes 
in order to express the parallelism that is desir¬ 
able for polyphony, for instance in voice banks 
or collections of sinusoids in additive synthesis. 
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Second, it allows parallelism in the signal pro¬ 
cessing engine so that multiprocessors can be 
exploited. 

A newer environment, ChucK[Wang and 
Cook, 2003], also uses a thread model and aims 
in part to make the creation and destruction 
of processes (named “shreds”) as lightweight as 
possible. This language is still under active de¬ 
velopment and may lead to new ideas for adapt¬ 
ing the concept of process to interactive com¬ 
puter music environments. 

The process models in both Supercollider and 
ChucK both lend themselves well to genera¬ 
tive applications, where processes may need to 
quickly and efficiently create new voices or in¬ 
stances of computational algorithms. In both 
environments the creation and destruction of 
processes is highly optimized so that large num¬ 
bers of them may be created and managed dy¬ 
namically. 

Pure data, on the other hand, offers no model 
of a process and therefore is badly adapted 
both to expressing polyphony (although this 
is fixable using a voice bank management ob¬ 
ject available in Pd extended but not yet in 
“vanilla”). It is even less well adapted to ex¬ 
pressing generative algorithms in which data 
may fork and recombine in ways that Supercol¬ 
lider and ChucK make easy. It is perhaps the 
most distinguishing feature of Max and later Pd 
that they both radically did away with the no¬ 
tion of process altogether. 

Pd offers no easy way to manage parallelism 
either; the facility provided is the “pd~” ob¬ 
ject which can be considered a throwback to 
the Max/FTS solution from 1990. There are 
advantages gained by this trade-off. Most im¬ 
portantly (in my view at least), the fluency and 
ease by which Pd patches can react to input 
from the outside world is much greater. This 
is partly because of the absence of a process 
model, because one never has to consider how 
different processes must be synchronized in or¬ 
der to react consistently to new and possibly 
unpredictable inputs. 

5 Conclusion 

The notions of “process” and “thread” seem 
eternally attractive to designers of real-time 
computer music programming environments 
such as the ones discussed here. The attrac¬ 
tion sees to be for both expressive reasons (as a 
way to describe polyphony, particularly in gen¬ 
erative situations) and for efficiency reasons (as 


ways to efficiently exploit parallelism in general- 
purpose processors). Yet the difficulties of man¬ 
aging coordination between processes or threads 
still make it appear impossible to adapt them 
easily to an environment like Pd. This is a hard 
problem that is worthy of future work. 

Meanwhile, the most powerful arithmetic pro¬ 
cessors in modern devices are their graphics 
processors. We don’t yet have a good under¬ 
standing of how to exploit these architectures 
for computer audio, and indeed this seems so 
far from today’s programming models that it is 
hard to see where we could start on this. 

It seems that the state of the art in program¬ 
ming environments for doing interactive com¬ 
puter music is out of sync with current devel¬ 
opments in computing. Past efforts to make 
music out of computer hardware and operating 
systems that were often ill suited to the task 
have often resulted in advances that had impli¬ 
cations not only for computer musicians but for 
computer science as well. Attacking the current 
situation in a similar way might similarly give 
rise to useful new ideas. 
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Abstract 

FaustLive is a standalone just-in-time Faust 
compiler. It tries to bring together the conve¬ 
nience of a standalone interpreted language with 
the efficiency of a compiled language. Based 
on libfaust, a library that provides a full in¬ 
memory compilation chain, FaustLive doesn’t 
require any external tool (compiler, linker, etc.) 
to translate Faust source code into binary ex¬ 
ecutable code. 

Thanks to this technology, FaustLive pro¬ 
vides several advanced features. For example 
it is possible, while a Faust application is run¬ 
ning, to modify its behavior on-the-fly without 
any sound interruption. It is also possible to mi¬ 
grate a running application from one machine to 
another, etc. 

Keywords 

Audio, Faust, DSP programming, remote process¬ 
ing and interfacing 

1 Introduction 

Faust [Functional Audio Stream] [6] is a func¬ 
tional, synchronous, domain-specific program¬ 
ming language specifically designed for real¬ 
time signal processing and synthesis. A unique 
feature of Faust, compared to other existing 
music languages like Max, PD, Supercollider, 
etc., is that programs are not interpreted, but 
fully compiled. Faust provides a high-level 
alternative to C/C++ to implement efficient 
sample-level DSP algorithms. 

But, if compilers have the advantage of ef¬ 
ficiency, they have their own drawbacks com¬ 
pared to interpreters. Compilers traditionally 
require a whole chain of tools to be installed 
(compiler, linker, development libraries, etc.). 
For non-programmers this task can be com¬ 
plex. The development cycle, from the edition 
of the source code to a running application, is 
much longer with a compiler than with an inter¬ 
preter. This can be a problem in creative situ¬ 


ations where quick experimentation is essential. 
Moreover, binary code is usually not compatible 
across platforms and operating systems. 

FaustLive is an attempt to bring together 
the convenience of a standalone interpreted lan¬ 
guage with the efficiency of a compiled lan¬ 
guage. Based on libfaust, a library that provides 
a full in-memory compilation chain, FaustLive 
is a standalone application that doesn’t require 
any external tool to translate Faust source code 
into binary executable code and run it. In many 
aspects FaustLive behaves like a Faust inter¬ 
preter with a very short development cycle (not 
very different, in that aspect, from modern com¬ 
piled LISP environments, or from the approach 
presented by Albert Graef with Pure in [1]). 

Moreover, FaustLive provides some advanced 
features to speedup the development cycle. For 
example, while a Faust application is running, 
it is possible to edit and recompile its Faust 
code on-the-fly, without any sound interruption. 
If the application is using JACK as driver, all 
audio connections are maintained. Another in¬ 
teresting feature is the possibility to migrate a 
running application from one machine to an¬ 
other through the network even across operat¬ 
ing systems. Applications can also be controlled 
remotely, using HTTP or OSC. 

FaustLive offers a lot of flexibility to proto¬ 
type audio applications. It can also be con¬ 
nected to Faust Web, a remote compilation ser¬ 
vice to export the application as a traditional 
binary for one of the various operating system 
and audio architecture supported by the Faust 
ecosystem. 

Since FaustLive is based on libfaust , the 
Faust compiler project will first be presented 
(see Section 2). Then FaustLive will be shortly 
described through a typical use case (see Sec¬ 
tion 3) to finally be detailed over its technical 
aspects (see Section 4). 
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2 Faust Compiler 

The Faust compiler translates a Faust pro¬ 
gram into an equivalent imperative program 
(typically C, C++, Java, etc.), taking care of 
generating efficient code. The Faust package 
also includes various architecture files, provid¬ 
ing the glue between the generated code and 
the external world (audio drivers and user in¬ 
terfaces). 

ARCHITECTURE FILE 


t -1 

Faust 

C++ FILE 

f -s 

Gcc 

Compiler 


Compiler 


EXECUTABLE 
APPLICATION OR PLUGIN 


guages. Executable code is dynamically pro¬ 
duced using a “Just In Time” compiler from a 
specific code representation, called LLVM IR 1 . 
Clang, the “LLVM native” C/C++/Objective- 
C compiler is a front-end for LLVM Compiler. 
It can, for instance, convert a C or C++ source 
file into LLVM IR code (Figure 3). 



Figure 3: LLVM compiler structure 


Figure 1: Steps of Faust compilation chain 


The current version of the Faust compiler 
produces the resulting DSP code as a C++ 
class, to be inserted in the architecture file. The 
resulting C++ hie is finally compiled with a 
regular C++ compiler to produce the final exe¬ 
cutable program or plug-in (Figure 1). 

The resulting application is structured as 
shown in Figure 2. The DSP has become an 
audio computation module. As for the archi¬ 
tecture, it turned into links to the user interface 
and the audio driver. 



Figure 2: Faust application structure 


Domain-specific languages like Faust can 
easily target the LLVM IR. This has been done 
by developing a special LLVM IR backend in 
the Faust compiler, [5]. 

2.2 Dynamic compilation chain 

The complete chain goes from the DSP source 
code, compiled in LLVM IR using the LLVM 
backend, to finally produce the executable code 
using the LLVM JIT. All steps are done in mem¬ 
ory. Pointers on executable functions can be 
retrieved in the resulting LLVM module, and 
their code directly called with the appropriate 
parameters. 

In the faust2 development branch, the Faust 
compiler has been packaged as an embeddable 
library called libfaust , published with an asso¬ 
ciated API, [2], This API imitates the concept 
of oriented-object languages, like C++. The 
step of compilation, usually executed by gcc, 
is accessed through the function createDSP- 
Factory. Given a Faust source code (as a file 
or a string), the compilation chain (Faust 
+ LLVM JIT) generates the “prototype” of 
the class, called llvm-dsp-factory. Then, the 
function createDSPInstance, corresponding to 
the “new className” of C++, instantiates 
a llvm-dsp. It can then be used as any ob¬ 
ject, run and be controlled through its interface. 


2.1 LLVM 

LLVM (formerly Low Level Virtual Machine) is 
a compiler infrastructure, designed for compile¬ 
time, link-time, run-time optimization of pro¬ 
grams written in arbitrary programming lan- 


Embedding this technology in a program or 
a plug-in enables dynamic modifications of the 
audio computation module of a Faust applica¬ 
tion [4], 

1 The Intermediate Representation is an intermediate 
SSA representation 
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3 FaustLive - Use Case 


FaustLive is a QT-based 2 software that per¬ 
mits to launch Faust applications from their 
source code without having to precompile them 
(Figure 4). 




BEHAVIOR MODIFICATION FROM F00 TO BAA 


Figure 5: Behavior modification 


—1- 

declare name « BAA » 

1 

EDIT 

' 

UPDATE . 

■ ■ 

— 


process - 


Figure 6: Dynamic source edition 


Figure 4: FaustLive principle 


FaustLive exploit dynamic compilation, 
associated with multiple interfacing systems 
and audio drivers to modulate the structure 
of Faust applications and simplify Faust 
prototyping process. 

To give an idea of FaustLive’s potential, the 
following section presents its diversified fea¬ 
tures, showing the corresponding alterations in 
the structure of the applications. 

The starting point of FaustLive’s features is 
drag and drop. A Faust DSP can be opened 
in a new window or it can be dropped on a 
running Faust application. As a result, an 
intermediate state emerges in which the two 
applications coexist. The arriving application 
copies the established audio connections. Then, 
the output of the old application is cross-faded 
to the new one (Figure 5). At last, the dropped 
application durably replaces the previous one. 
With that system, a running application can 
be changed endlessly, without audio click. 

This mechanism also allows source edition. 
When the user chooses to edit its Faust code, 
it is opened in a text editor. And as his changes 
are saved, the application is updated using the 
crossfade mechanism (Figure 6). 


2 QT is a framework for interface design 


JACK was primitively adopted as audio 
driver for it allows the user to connect its 
Faust applications between themselves. Other 
drivers have then been added, making this 
component of the structure as flexible as 
the others. So when Faust applications are 
running, FaustLive gives the possibility to 
dynamically switch the audio driver. FaustLive 
does not need to be stopped. The migration is 
made during execution and is applied to every 
Faust application running. JACK, Net Jack, 
CoreAudio and PortAudio are the integrated 
drivers in FaustLive (Figure 7). 



'COREAUDIO Domain' 


••PORTAUDIO Domain'' 


Figure 7: Dynamic driver migration 


FaustLive expands its radius of action to ex¬ 
ternal interactions. A smartphone can open an 
OSC 3 interface, controlling the application re¬ 
motely (Figure 8). 

Likewise, a HTML interface is accessible 
through a Qr Code 4 . By scanning it with a 

3 OSC : Open Sound Control 

4 QR code (abbreviated from Quick Response Code) 
is the trademark for a type of matrix barcode (or two- 


145 





























Figure 8: OSC interface 


touchpad (for instance), the remote interface 
is opened in a browser. In both cases, the 
interface is duplicated and a synchronization 
between the local and remote interface is 
established. 

The HTML interface has an additional 
interest: it is set up to enable drag and drop. 
Therefore, the user controlling the remote 
interface can also change the behavior of the 
application by dropping his own DSP. It is sent 
to the local application where it replaces the 
running one, using the crossfade mechanism. 
Finally, the remote interface is updated (Figure 

9) . 

If many or/and heavy FAUST applications 
are opened, local CPU load can be saturated. 
The migration of calculations to other machines 
can lighten this load. On account of dynamic 
compilation, the audio computation module 
can be relocated on another machine (Figure 

10) . The list of remote servers available is built 
dynamically so that it is simple to switch from 
local processing to remote processing. 

A user may wish to run his FAUST appli¬ 
cation in an other environment (Max/MSP, 
SuperCollider, ...). For that matter, a link to 
Faust Web, a remote compilation web service, is 
integrated in FaustLive. The user only has to 
choose the platform and environment he wishes 
to target. In return, he will receive the binary 
of the requested application or plugin. 

When FaustLive is exited, the last configu¬ 
ration is saved and will be restored at its next 

dimensional barcode) 



Figure 9: Remote drop 




Figure 10: Remote processing 


execution. A user may also save the state of 
the application at any moment. In a second 
phase, he will be able to reload his snapshot, 
by importing it in the current state or recalling 
it (Figure 11). 
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CURRENT STATE RESULTING STATE 


Figure 11: Reloading snapshot 

4 FaustLive - Technical View 
4.1 Basic FaustLive Features 

The first aim of FaustLive is to create a dynamic 
environment for Faust prototyping, by embed¬ 
ding libfaust. The resulting dynamic compila¬ 
tion chain (Figure 12) presents the advantage 
of speeding up the compilation process. Return¬ 
ing almost right away the executed application, 
this compiler is a stepping stone for dynamic 
behaviors. 



Figure 12: Compilation chain in FaustLive 


Now that it is possible to dynamically 
compile Faust code, new prospects are rising. 
A user may drop his Faust code as a file, a 
string or a url, on a running application. As 
a result, the code is immediately given to the 
embedded compiler and the new application 
replaces the previous one. Since FaustLive is 
designed for dynamic uses, it is very important 
to ensure a continuity in the sound. For that 
matter, a crossfade is calculated between the 
two relaying Faust applications. 

Moreover, a Faust application is linked to 
its source, so that any modification in the 


Faust code will lead to a recompilation. This 
particular aspect is central, for it simplifies 
the prototyping process: a user can modify his 
code at leisure and see/hear instantly the result. 

An important asset of FaustLive is the coex¬ 
istence of multiple Faust applications, in op¬ 
position with the QT-JACK architecture from 
Faust “static” distribution, where every Faust 
program has to be compiled separately to pro¬ 
duce its own application. Here, each application 
evolves with the actions it undergoes and has its 
own set of dynamic parameters (Figure 13). 



Figure 13: FaustLive’s environment 


4.2 Audio Drivers 

FaustLive has integrated JACK, CoreAudio, 
Net Jack and Port Audio 5 . So that it’s possible 
to switch audio structures or modify its pa¬ 
rameters (such as buffer size or sample rate) 
during FaustLive’s execution. Every running 
audio client is stopped, then the applications 
are transferred in the new domain to finally be 
restarted. 

4.2.1 JACK 

JACK is a system for handling real-time, 
low latency audio (and MIDI). It runs on 
GNU/Linux, Solaris, FreeBSD, OS X and Win¬ 
dows. It can connect a number of different ap¬ 
plications to an audio device, as well as allowing 
them to share audio between themselves. 

5 JACK, CoreAudio and NetJack are used on OSX, 
JACK and NetJack on Linux, PortAudio, JACK and 
NetJack on Windows. 
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Therefore, an interesting constraint in using 
JACK is the matter of the connections. When 
connections have been established, the objec¬ 
tive is to maintain them even if the Faust 
application changes in a window. If the new 
application has more ports than the previous 
one, the user will have to make the connections 
himself. 

4.2.2 NetJack 

NetJack is a Realtime Audio Transport over 
a generic IP Network, fully integrated into 
JACK. NetJack synchronizes all clients to one 
soundcard, so there is no resampling or glitches 
in the whole network. The master imposes the 
sample rate and buffer size, in relation to its 
audio device. 

4.2.3 CoreAudio and PortAudio 

Because the protocol has to be strictly the same 
on the client and on the server’s side, JACK and 
NetJack have to be linked as a dynamic library. 
The problem it brings is that FaustLive’s in¬ 
stallation is linked to JACK’s installation. To 
avoid this inconvenience for beginner users, a 
CoreAudio 6 and PortAudio 7 versions have been 
developed. Included in the standard libraries or 
easily linked as a dll, they do not expand the 
user’s work. 

4.3 Control Interfaces 

To offer a modular application, FaustLive 
expands the choices of the user, concerning the 
control interface. 

4.3.1 OSC Interface 

OSC protocol is integrated to FaustLive to 
offer another type of interface and enable 
interoperability. Many audio environments 
and devices implement this protocol so that 
FaustLive will be able to communicate with 
them. The user can configure the port on 
which the protocol is started and then control 
the interface with, for instance, an OSC touch 
application. 


6 CoreAudio is the digital audio infrastructure of iOS 
and OS X. It provides a framework designed to handle 
audio needs in applications. 

' PortAudio is a free, cross-platform, open-source, 
C/C++ audio I/O library. It is intended to promote 
the exchange of audio software between developers on 
different platforms. 


4.3.2 HTML Interface 

Faust HTML interface is also a component of 
FaustLive. Loaded on any browser, this inter¬ 
face controls the DSP’s parameters, through a 
HTTP connection. When it is built, a server 
is started, taking care of delivering the HTML 
page (Figure 14). A synchronization between 
the local and the remote interface is also in¬ 
sured. 

To ease the opening of the interface, a Qr 
Code is built from the HTTP address, thanks 
to libqrencode. Most smartphones and portable 
equipments have a QrCode decoder. By scan¬ 
ning the Qr Code, a browser gets connected to 
the interface page. 

4.3.3 Preferences 

The challenge FaustLive was confronted with 
is to provide an interface that gives as many 
liberties as possible to the user all the while 
being easy to apprehend. In that direction, 
OSC and HTTP ports are configurable in 
the window’s options. The window toolbar 
is collapsed, by default, so that a “basic” 
user won’t feel assailed by preferences (Figure 
13). Both protocols use 5510 as default port. 
When the TCP listening port number is busy, 
the system automatically looks for the next 
available port number. 



Full html page 


Figure 14: HTML interface with control and 
dropping services 


4.3.4 Remote Drag and Drop 

As the rest of Faust current distribution, 
the HTML interface has a “static” behavior. 
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The intention, to copy FaustLive’s dynamic 
behavior, led to adding a dropping area to 
the HTML interface. This HTTP service is 
independent and specific to FaustLive. The 
server, started by FaustLive, is able to create 
a HTML page that encapsulates the remote in¬ 
terfaces. The resulting service of remote inter¬ 
face and DSP drop has the following address: 
http://IP:DroppingPort/InterfacePort (Figure 
14). 

The dropping port is set in the preferences 
and is common to all the Faust applications. 
The remote interface port is distinct for every 
Faust application and editable in the window’s 
options. 

The reaction to the drop follows FaustLive’s 
model. The DSP is first sent to FaustLive as a 
HTTP post request. The DSP is compiled and 
replaces the previous one, after the crossfade. 
At last, the remote interface is updated. 

4.4 Remote Processing 

To widen its benefits, FaustLive enables remote 
processing. The compilation and process 
calculation are redirected on a remote machine 
and local CPU load can be lightened. 

On a remote machine, an application starts 
a HTTP server, offering the remote compila¬ 
tion/processing service. This server is waiting 
for requests. 

On the client’s side (FaustLive), an API 
“proxy” makes it transparent to create a 
remote-dsp rather than a local llvm-dsp (c.f 
2.2). This API, libfaustremote, takes care of 
establishing the connection with the server. 

The first step (compilation) is carried out 
by the function createRemoteDSPFactory. The 
code is sent to the server, which compiles it 
and creates the “real” llvm-dsp-factory. The 
remote-dsp-factory returned to the user is an 
image of the “real” factory. Before sending the 
Faust code, a Faust to Faust compilation 
step is executed locally, to solve all the depen¬ 
dencies. This way, the expanded code sent to 
the server is self-contained. 

The remote-dsp-factory can then be instan¬ 
tiated to create remote-dsp instances, which 
may run in the audio/visual architecture cho¬ 
sen (here, FaustLive). 

To be able to locally create the interface, the 
server returns a json-encoded interface. This 


way, the function buildUserInterface can be 
recreated, giving the impression that a remote- 
dsp works as a local llvm-dsp. 

Moreover, the audio processing is redirected 
through a NetJack connection. The audio data 
is sent to the remote machine which processes 
it and sends back its results. In addition to 
the standard audio flow, one midi port is used 
to transfer the controllers values (Figure 15). 
The benefit of this solution is to transmit 
synchronized audio and controllers in the same 
connection. Moreover, the audio samples can 
be encoded using the different possible audio 
data types : float, integer, and compressed 
audio (using the OPUS codec 8 ). 



Figure 15: Remote compilation 


libfaustremote uses libcurl to send http 
requests to the remote server, handled with 
libmicrohttpd. 

On FaustLive’s windows, the service of re¬ 
mote processing is simply interfaced. The Zero- 
Conf protocol is used to scan the remote ma¬ 
chines presenting the service. A list is then 
dynamically built with the available ones. By 
browsing in the list, the user can then switch 
from a machine to another or come back to lo¬ 
cal processing very easily. 

4.5 Faust Web 

In order to simplify the accessibility of the 
Faust compilation, this web service of remote 
compilation has been conceived. It receives a 
Faust DSP and returns a plugin or applica¬ 
tion in the chosen target architecture. As an 
outcome, the installation of Faust package and 
all additional SDKs on the user machine is not 
necessary anymore. Anyone can write a Faust 

8 http://www.opus-codec.org 
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application, send it to the server and receive a 
plugin. 

This service is accessible from a browser but 
requires several requests. Through FaustLive, 
the export is facilitated. A menu is dynami¬ 
cally built with the platforms and architectures 
available. And as the user makes his choice, 
his code is sent to the server. The first step is 
the syntax verification, returning a shal key, 
with which multiple requests can be made. 
The second step is the compilation, using 
the standard “static” chain and returning the 
chosen application to the user (Figure 16). 


FAUSTLIVE FAUSTWEB 


EXPORT MANAGER 


Platform osx 
Architecture android 


source or binary binary.zip 


[ Cancel ) 


[ Export ] 


GET 

/Targets 

List of 
targets 


Search for 
available targets 


Export Manager 

Connection to Server % r 
Remote Compilation » ' 


harpeautomation_osx_coreaud 
io-qt_binary.zip was 
successfully exported 


[ Cancel ] [ Save ] 


POST 
/dsp file 

shal-key 


Verification of 
faust syntax 

Creation of 
unique shal-key 
for the dsp 


GET 

/key/platform 

/architecture 

> 

< 

binary file 


Compilation for 
chosen 
architecture 


Figure 16: Steps of the compilation chain 
through Faust Web 


4.6 Session Management 

A concept of session is introduced to preserve 
the state of the application (parameters values, 
position on screen, audio connections, compila¬ 
tion options, ...) when the application is closed 
or when the user takes a snapshot, which saves 
his session in a .tar file. 

A FaustLive snapshot is self-contained. All 
the local resources needed (like Faust DSPs) 
are copied into the folder. Pointers to the re¬ 
sources are used as much as possible. But if 
a source file is erased or the snapshot is trans¬ 
ferred on another machine, copies ought to be 
employed. 


To decrease the compilation time, the out¬ 
put of Faust compiler, the optimized LLVM 
IR code, is saved. When the application is re¬ 
called, Faust compiler’s and LLVM IR to IR 
optimization steps are skipped. For very heavy 
programs, the gain can be noticeable (from a 
few seconds to almost instantaneous). 

5 Conclusion 

FaustLive brings together the convenience of a 
standalone interpreted language with the effi¬ 
ciency of a compiled language. 

FaustLive offers currently the shortest de¬ 
velopment cycle for Faust applications, allow¬ 
ing even to modify the code of an application 
while it is running. It integrates advanced re¬ 
mote computation and control features for real¬ 
time distributed audio applications. Moreover 
FaustLive provides, via its export functionality, 
a convenient front-end for Faust Web, the com¬ 
pilation web service of Faust. The project is 
open-source and available on Sourceforge [3]. It 
runs on Linux, OSX and Windows. 
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Abstract 

We present a recent port of the OpenMusic 
computer-aided composition environment to Linux. 
The text gives a brief presentation of OpenMusic 
and typical use-cases of the environment. We also 
present a short history of its development, and men¬ 
tion previous attempts at porting it to Linux. The 
main technical challenges involved with developing 
the current Linux port are discussed, as well as solu¬ 
tions to these. We end the paper by outlining some 
possible areas for future work. 

Keywords 

OpenMusic, Computer-Aided Composition (CAC), 
Linux, Common Lisp, Visual Programming, JACK. 

1 Introduction 

OpenMusic (OM) is an environment for 
computer-aided music composition designed 
as a domain-specific visual programming lan¬ 
guage. 1 It allows composers to write and run 
programs to transform or generate data, with 
interactive access to the input or output musi¬ 
cal structures. 

Before 2013 OM was developed and distri¬ 
buted on OSX and Windows platforms only, 
despite various attempts at porting the envi¬ 
ronment to Linux. In this paper we present a 
new, fully functional port of OM to Linux. 

In Section 2 we will introduce the OpenMusic 
environment, first from a general point of view, 
and then discuss some particular aspects such as 
the user interface and the external dependencies 
of the environment. We also give a quick his¬ 
tory of the development of OM, and previous 
attempts at porting the environment to Linux. 

Section 3 describes our current implemen¬ 
tation choices and the state of the Linux port. 
We conclude with a number of perspectives and 
areas for future work. 

* This work is supported by BEK - Bergen Center for 
Electronic Arts. 

1 http: //repmus.ircam.fr/openmusic/ 
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2 OpenMusic 

2.1 A visual programming environment 
for computer-aided composition 

OM is a visual programming environment dedi¬ 
cated to music processing and composition 
[Assayag et ah, 1999]. It implements the 
main features of the Common Lisp language 
(abstraction, higher-order functions, recursion, 
iterations etc., see [Bresson et ah, 2009]), 2 as 
well as object-oriented programming [Agon and 
Assayag, 2003] and constraints programming 
[Rueda et ah, 1998]. 

OM is primarily an environment for work in 
computer-aided composition (CAC). It is also 
used for musicological tasks like analysis, mod¬ 
eling or statistics, as well as pedagogical work in 
composition studies or music theory [Bresson et 
ah, 2011]. The environment comes with a rich 
set of tools and libraries aimed at composition, 
analysis, DSP and other musical/extra-musical 
domains. 

The aim of a CAC application is to aid the 
user in typical composition tasks like gene¬ 
rating, representing and manipulating musical 
material in adequate ways, handling musical 
form as material develops towards a finished 
piece of music. Contemporary composition is 
a rather ill-defined activity, 3 and a good CAC 
tool is one with abilities to adapt well to human 
artistic processes, whatever those may be. 
Indeed, composers seldom follow strict plans for 
too long. More common is perhaps making up 
a class of material, generate versions on that, 
develop this again further towards some more 
complex structures, go back to change some- 

functional programming and Lisp in particular has a 
strong background and tradition in the computer music 
history. It has proven to be relatively easy to learn, 
understand and use by composers. 

3 “CAC is in effect making the computer carry 
out thought processes previously carried out in human 
brains” (Miller Puckette, preface to The OM Composer’s 
Book vol. 1 [Puckette, 2006]). 
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thing in an earlier stage, to arrive at yet another 
set of variants, and so on. 

Interactive (visual) programming languages 
are well suited environments for composers to 
work efficiently and creatively with comput¬ 
ers. The graphical environment in OM provides 
an interactive patch-based, data-flow approach, 
making it easy and intuitive to get going. 
Figure 1 shows a basic patch window generat¬ 
ing a series of chords. 



Figure 1: Graphical programming with OM. 

OMs dynamic environment supports the 
kinds of interactive work-flows often preferred 
by composers, by allowing the user to eva¬ 
luate and store output at arbitrary stages in the 
program-flow, and easy stash away variants for 
any kind of musical material or data. 

OM is also a full-blown environment for 
Lisp-programming, with access to the expected 
REPL , special editors for Lisp code, cross- 
referencing and look up in source-code, and 
interactive help-system. These tools provide 
an environment for composers and program¬ 
mers to develop new ideas and specialized crea¬ 
tive work-flows, using graphical programming, 
textual programming, or any combination of 
these. In OM there is no particular separation 
between a function defined as an abstraction in 
a patch box or one defined by evaluating Lisp 
code: both are accessible as graphical objects in 
the user-interface. 

4 REPL = Read-Eval-Print loop. 


Reports of some composition tasks and ap¬ 
proaches carried out with OpenMusic are avail¬ 
able in [Agon et ah, 2006 2008]. 

2.2 User Interface 

As a visual programming environment, OM is 
highly concerned with graphical user interfaces 
(GUI). To a large extent, this aspect determines 
the choices of platforms and frameworks that 
may be used for development. 

While programming in OM, the user inserts 
and manipulates graphical objects in patch win¬ 
dows, dragging connection-lines between inlets 
and outlets of boxes, and thus establishing the 
data-flow of a musical program. These graph¬ 
ical objects may represent simple operators, 
abstractions or sub-patches, or more complex 
data like lists, arrays, break-point functions etc. 

As part of the graphical programming 
environment, OM provides advanced editors for 
visualization and manipulation of data such 
as musical scores, break-point functions, audio 
waveforms and some other data types. Figure 2 
shows the sound-editor, and Figure 3 shows 
editors for various other kinds of musical data. 



Figure 2: Editor for the OM sound object. 

OM also has a programmable graphical time¬ 
line editor termed the “Maquette”, where every¬ 
thing else which lives in OM - functions, 
musical data, connections - can be placed and 
manipulated, either manually or as a result of 
evaluating some code. 

2.3 Development history — Previous 
ports and platforms 

OpenMusic is a descendant of the Patchwork 
environment [Laurson and Duthen, 1989], one 
of the pioneering visual programming systems 
dedicated to computer-aided composition. 

The first release (C. Agon and G. Assayag, 
IRCAM) was built in 1998 on Mac OS using 


152 






Figure 3: OM environment: musical data and editors. 


Digitools’ Macintosh Common Lisp (MCL). 
The graphical programming system was de¬ 
signed as a full meta-progrannning framework, 
implementing functional programming concepts 
and interactions, on a strong base of object- 
oriented programming and CLOS (Common 
Lisp Object System [Gabriel et al., 1991]). 
OM also introduced a number of new concepts 
concerning for instance handling and manipula¬ 
tion of objects, or unified representation of time 
structures. 

In 2003, OM4 was ported to OSX, and a 
first Linux port [Sarria and Diago, 2003] was 
carried out at IRCAM in the framework of the 
AGNULA European project [Dechelle and Tis- 
serand, 2003], using CMUCL 5 and Gtk+ 6 . 

Released in 2005, OM5 [Bresson et al., 2005] 
was a multi-platform version of OM developed 
on Mac (using MCL) and on Windows (using 
Allegro CL 7 ). The OM5 sources were clearly 
divided into a platform-independent kernel 
built on top of an abstract graphical/system- 
dependent API inspired by the MCL toolkit 
(MCL and ACL versions of this API were then 
implemented and loaded depending on the tar¬ 
geted platform). 

A second port of OM on Linux was initi¬ 
ated in 2006 based on the OM5 sources, using 

5 http://www. cons.org/cmucl/ 

6 http://www.gtk.org/ 

' http: //www.franz.com/products/allegrocl/ 


SBCL 8 and a new implementation of the OM 
API for Gtk+ and CLG 9 (Common Lisp GTK 
bindings). Unfortunately this project was never 
carried to its end. 

In 2006, Digitools announced discontinuation 
of MCL development and support on Mac, due 
to the switch to Intel processors. The Lisp- 
Works environment was chosen as a replace¬ 
ment, providing a reliable IDE with a common 
cross-platform API (CAPI) compatible with the 
main graphic toolkits for Mac, Windows and 
Linux, as well as some other OS-es. 

In 2008 OM6 based on LispWorks was 
released for OSX and Windows. 

2.4 External dependencies 

OM is dependent on audio I/O systems due 
to its musical orientation. It is important for 
composers to be able to load audio or MIDI 
files, and convert/process the contained data 
in the visual programming framework. Play¬ 
back and rendering of generated musical mate¬ 
rial (score/MIDI or audio) is an essential fea¬ 
ture of the environment, and complex schedu¬ 
ling issues can arise when dealing with multi¬ 
ple simultaneous sources and audio/MIDI ma¬ 
terial interactively. Still, OM is not by nature 
a real-time environment and the audio perfor¬ 
mance requirements remain relatively moderate 

8 http: / / www.sbcl.org/ 

9 http://sourceforge.net / projects/clg/ 
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as compared to real-time scheduling or audio 
processing systems. 

Low-level MIDI formatting and scheduling 
was traditionally supported in OM by the 
MidiShare 10 system [Orlarey and Lequay, 1989]. 
The audio support, initially limited to a few 
functions interfacing with the Apple QuickTime 
library, was replaced in OM5 by a more ad¬ 
vanced and multi-platform audio support deve¬ 
loped on top of the LibAudioStream 11 library. 

Besides MIDI for file I/O, support for ex¬ 
porting other score-type data is provided ei¬ 
ther by using built-in code or by using external 
libraries. OM can export to some much used 
file formats for sheet-music, like LilyPond 12 
and MusicXML 13 . OM also features support for 
connections to technologies like SDIF [Schwartz 
and Wright, 2000] (standard format for inter¬ 
change of sound description data), OpenGL 
(display of 3D objects in OM editors) and OSC 
[Wright, 2005] for inter-application communi¬ 
cation, although none of these are strictly re¬ 
quired to get the visual programming environ¬ 
ment running. 

OM communicates with external libraries via 
Common Lisp foreign function interfaces (FFI). 
OM either uses the FFI provided by the Lisp im¬ 
plementation, or the CFFI system [Bielman and 
Oliveira, 2013], a common FFI wrapper compa¬ 
tible with several Lisp implementations. 

2.5 Distribution and licensing 

While the first OM releases were distributed 
commercially along with other IRCAM soft¬ 
ware, since OM6.4 (2011), the compiled OM 
application has been available free of charge for 
all platforms. 

The source-code has always been freely avail¬ 
able under the GNU Public License. 

All source-code and external libraries 
required for building OM are open-source. 
However, to build and save the actual distri¬ 
buted image a LispWorks Professional or 
Enterprise license is necessary at this moment. 

3 Towards OM on Linux 

Several previous efforts to port OM to Linux 
over the years suggest a real interest, and OMs 
main technological dependency (a professional 
ANSI Common Lisp implementation with inter¬ 
faces to graphical toolkits, MIDI and audio 

10 http: //midishare.sourceforge.net / 

11 http://libaudiostream. sourceforge.net/ 

12 http://www.lilypond.org/ 

13 http://www. musicxml.com/ 


libraries) has been available to Linux developers 
for long. Hence, it is legitimate to ask why these 
previous ports did not succeed too well? 

First, the Musical Representations team 
developing OM at IRCAM use Mac OSX as 
main development platform. As a consequence, 
porting OM to the attempted environments 
(CMUCL or SBCL w. Gtk) means rewriting all 
the graphical dependencies using foreign tool¬ 
kits or alternative APIs. 

Audio and MIDI support has also been 
an issue in previous Linux ports. Although 
MidiShare worked fine with earlier Linux- 
versions, 14 it is no longer maintained and kept 
compatible with newer releases of the Linux 
kernel. Moreover, OM MIDI dependencies were 
since the early releases packed together and 
integrated with the kernel code, making it diffi¬ 
cult to replace. 

As OM has evolved, work towards a more 
modular structure has been an explicit aim, at 
least since 2005 [Bresson et al., 2005]. Recent 
developments have gradually made the code 
more durable and resistant towards changes in 
compiler-implementations. This tendency both 
motivated and greatly helped the work with 
the current port, making it possible to de¬ 
velop alternative solutions for e.g. MIDI I/O, 
separated from those for the kernel, graphics or 
audio I/O. 

3.1 OM6.8 on Linux: Current State 

When starting this project, a separate Linux 
development-branch of the source-tree for 
OM6.7 was set up, making it possible to get 
up to speed on Linux without halting further 
development on the main-branch. At a certain 
stage the separate Linux-branch was re¬ 
integrated with the main source-code. Further 
development of the application, and delivery 
of images, is now based on the same source- 
tree for all three supported platforms - Linux, 
OS X, Windows - with only a few specializa¬ 
tions, mainly in the graphics-code, to account 
for differences across platforms. 

The present Linux development is based on 
OM6.8 and LispWorks 6.1. The choice of Lisp- 
Works (a commercial Lisp compiler and IDE) 
for porting OM to Linux is entirely pragmatic: 
as mentioned, LispWorks provides a common 
API across graphical toolkits (Gtk+, Motif, 
Cocoa, Windows), and since this library is 

14 MidiShare was also used in e.g. Common Music 
[Taube, 1991]. 
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already used by the OM developers, only mode¬ 
rate adaptions to the existing OM API are 
required in order to get it working with Linux. 

As suggested, this port of OM to Linux can 
be seen as the most recent in a series of steps 
towards making OM more modular. Solutions 
which work across platforms are presumably 
also less vulnerable to changes in compilers 
or toolkits in use. While developing Linux- 
compatible substitutes for previous code, gen¬ 
eral and platform-independent solutions were 
looked for. In particular, most external depen¬ 
dencies have been made optional, so that OM 
can run without some specific features, if depen¬ 
dencies are not found, not loaded or not avail¬ 
able for a specific environment or platform. 

3.2 Audio and MIDI I/O: JACK 

To substitute the low-level I/O systems for 
MIDI and audio, some alternative approaches 
were programmed and tested. The OM “player” 
system was rewritten in OM6.7 as a modular 
API, making it easier to substitute or switch the 
default playback-engines with alternative audio 
or MIDI players. 

MIDI messages and Standard MIDI Files can 
now be parsed and formatted using the Com¬ 
mon Lisp MIDI library 15 , and E. de Castro 
Lopo’s libsndfile is used to access audio hies 
on disk. The default playback-engines used in 
the Linux version of OM depend on lib jack. 
Scheduling and real-time I/O of audio and MIDI 
between OM and hardware-ports (or between 
OM and other applications) have been pro¬ 
grammed as a CL-based JACK 16 client. 

Other test-case playback-engines were also 
developed, for instance with SuperCollider 
(Audio-MIDI I/O, scheduling) using OSC 
communication, with a CL-based FluidSynth 1 ' 
server (MIDI) controlled through Lisp-code, 
and with external MPlayer-processes 18 (audio) 
controlled from a sub-shell. These clients work 
fairly well, and could be used as examples for 
users or developers wanting to plug in other 
playback-engines or I/O systems. An alter¬ 
native audio player has been developed for 
instance using sox 19 as part of the OM-Sox 
library. 20 

15 http://www.doc.gold.ac.uk/isms/lisp/midi/ 

16 http://jackaudio.org/ 

17 http://www.fluidsynth.org/ 

18 http://www.mplayerhq.hu/ 

19 Sound eXchange - sox.sourceforge.net/ 

20 M. Schumacher, OM-Sox: 
http: //sourceforge.net/projects/omsox/ 


3.3 Libraries 

OM comes with a number of specialized tools 
and libraries aimed at composition, analysis, 
DSP and other musical tasks. Some of these 
are “official” libraries: either distributed and 
maintained as part of the main OM package, 
or used to integrate OM with external tools 
such as Csound or some of the DSP engines 
available from IRCAM (SuperVP, PM2, Spat, 
Diphone, Chant). 21 A rich set of 3rd-party 
libraries, maintained by individual composers 
or developers, are also available for download, 22 
together with simple how-tos for adding exter¬ 
nal libraries (see Figure 4). 


OM 6.7.1_beta_3 

File Edit Windows Help 



New User Package 
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t 8 basic tools 

New Class 
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Save 
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Last Saved 


D 0 esquisse 
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Get Info 

Ctrl+I 
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Page Setup 


hd qlobals 




Print 

Ctrl+P 


Figure 4: OM: Packages library. 

Libraries are written as standard Lisp code, 
so they generally work well with the Linux-port. 
However, those depending on platform-specific 
external tools (e.g. OM-Spat) are not very use¬ 
ful at the moment. 

3.4 Source code and packaging 

The OM source code is distributed as part of the 
application. 23 The environment features run¬ 
time introspection and provides an easily acces¬ 
sible cross-reference for all OM-specihc classes 
and methods. Lisp code may be edited and eval¬ 
uated interactively, or loaded from files, and 
may be used to specialize or modify built-in 
functionality. 

To compile a fresh OM from sources, access 
to Lisp Works is necessary. For this reason OM 
is always made available as a pre-built image, 
one for each platform-type. At the time of this 

21 http: / / forumnet.ircam.fr / product/openmusic- 
libraries/ 

22 http: / / repmus.ircam.fr / openmusic/libraries 

23 http: / / repmus.ircam.fr / openmusic/sources 
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writing, the Linux version is developed and 
maintained on a system running Fedora 19. 
Currently, OM is available as a RPM-package 
containing the image and all sources, which 
will install the binary and all sources in the 
usual places. Several users have adapted the 
RPM-packages to dpkg-based systems, seem¬ 
ingly without any serious issues, and also shared 
how-tos and experiences on the OpenMusic 
forums 24 . 

Patches are usually distributed as Lisp files. 
These can be dynamically loaded by the user or 
be automatically sourced on application start 
by placing the files in a predefined location. 

4 Conclusions — Future works 

The first beta-release of OM-Linux was made 
available for download and presented at the 
IRC AM Forum workshops in November 2013, 
after having been tested and used by developers 
and users for some time. 25 

The previous Linux ports missed their poten¬ 
tial audience and lacked support and follow up 
by the Linux developers and composer’s com¬ 
munity, presumably due to a number of obsta¬ 
cles and difficulties that we have tried to outline 
in this paper. Our hope is that this project will 
overcome most of these problems. 

A working Linux-version of OM is useful for 
end-users, and Linux-developers may well find 
good ways to integrate OM with other applica¬ 
tions through e.g. libraries, or to develop OM 
further. The stabilization of the GUI API may 
also help to lessen dependencies on Lisp Works 
in the future for all platforms, and make alter¬ 
native open-source solutions possible. In this 
effort, starting out from a functional version on 
Linux may be valuable. 

Since the current code uses one common 
source-tree, the Linux-port is relatively robust 
and sustainable across changes in compiler- 
implementations or frameworks for graphics and 
I/O. It also minimizes the work involved in 
maintaining compatibility across platforms for 
the same application. 

The features introduced to make the sources 
compile and run on Linux, as well as the new 
developed support e.g. for audio and MIDI are 
of a general kind, and potentially useful across 
platforms. The CL-based MIDI-library, as well 

24 http: //forumnet. ircam.fr/user-groups/ 

25 A review of this Linux port has been posted 
by D. Philips on LWN in November 2013, see 
https://lwn.net/Articles/574593/. 


as the JACK-client and callback setup for MIDI 
and Audio, are now possible alternatives also for 
OS X and Windows. 

Further work will be done in the near future 
to integrate other CL-based composition and 
DSP-tools such as Common Music, or Common 
Lisp Music (CLM 26 ) [Schottstaedt, 1994], and 
to extend the existing set of libraries connecting 
OM with open-source software like e.g. Super- 
Collider or LilyPond. While the JACK-client is 
currently useful, real-time robustness can still 
be improved, and other audio-libraries or back¬ 
ends may also be supported in the future. 
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Abstract 

Radium is a new type of music editor inspired by 
the music tracker. Radium’s interface differs from 
the classical music tracker interface by using graphi¬ 
cal elements instead of text and by allowing musical 
events anywhere within a tracker line. 

Chapter 1: The classical music tracker interface 
and how Radium differs from it. Chapter 2: Ra¬ 
dium Features: a) The Editor; b) The Modular 
Mixer; c) Instruments and Audio Effects; d) In¬ 
strument Configuration; e) Common Music Nota¬ 
tion. Chapter 3: Implementation details: a) Paint¬ 
ing the Editor; b) Smooth Scrolling; c) Embed¬ 
ding Pure Data; d ) Collecting Memory Garbage in 
C and C++. Chapter 4: Related software. 

Keywords 

Radium, Music Tracker, GUI, Pure Data, Graphics 
Programming. 

1 Introduction 

The tracker interface appeared on the AmigaOS 
platform in late 80s and early 90s with pro¬ 
grams such as Soundtracker, NoiseTracker and 
Protracker. The first tracker was called 
“The Ultimate Soundtracker” , 4 and was re¬ 
leased in 1987 by Karsten Obarski. 1 2 

In the classical tracker interface, time goes 
downwards. Notes placed higher on the screen 
are played before notes placed below. 3 Instead 
of moving the cursor up or down, the whole ed¬ 
itor scrolls up or down, and the cursor is just 
locked in the middle of the screen. 

The tracker editor shows a two-dimensional 
table in which musical events can be stored. We 
can think of it as a spreadsheet with tracks as 
columns and lines as rows. 

1 According to Wikipedia: http://en.wikipedia.org/ 
wiki/Music_tracker 

“nttp://en.wikipedia.org/wiki/Ultimate_ 
Soundtracker 

3 When I started making radium, I also considered 
letting time go in the horizontal direction. I don’t re¬ 
member why I chose the vertical direction. 


Musical events are defined with pure text. 
The event C#3 5-32-000 plays the note C 
sharp at octave 3 using instrument number 5 at 
volume 32. The last three zeroes can be used 
for various types of sound effects, or to set new 
tempo. 

The tables are called patterns , and a song 
usually contains several patterns. To control 
the order patterns are playbed back, we use a 
playlist. For example, if we have three patterns, 
a typical song could have a playlist like this: 
1, 2, 1, 2, 3, 1, 2. 

1.1 How Radium Differs from the 
Classical Tracker Interface 

Radium 4 differs from the music tracker inter¬ 
face by using graphical elements instead of text 
and by allowing any number of events to be 
placed anywhere. 5 The latter means that a line 
in Radium is essentially just a graphical hint. It 
should be possible to compose millions of years 
of music within just one tracker line. 

These differences are so fundamental, that it’s 
questionable whether Radium can be defined as 
a tracker. 

1.2 History of Radium 

The first version of Radium was released in year 
2000 under the GPL license, and it only sup¬ 
ported MIDI. After the initial release, Radium 
was developed actively for around a year, fol¬ 
lowed by a period between 2001 and 2012 with 
less development. Since 2012, Radium has been 
actively developed again. 

The features presented in this paper have 
mostly been implemented in 2012 and 2013. 
Audio support was introduced in November 
2012. 

4 http://users.notam02.no/~kjetism/radium/ 

5 This feature may not be useful, depending on liow 
you compose music. But at least when using splitted 
lines, and for accurately importing midi files and other 
music formats, it’s a necessary feature. 
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1.3 Portability 

The first version of Radium was released 
for the Amiga Operating System ( AmigaOS ), 
version 3.0 or later. The code was written in 
a portable style, where non-portable code was 
clearly separated and easy to replace. An alpha 
version for Linux was available already in 2001. 

Radium is at the time of writing available for 
Linux, Windows, and Mac OS X, where Linux 
is the main development platform and the plat¬ 
form with the most features. It should be 
straight forward to port Radium to a platform 
which has Jack, POSIX, and Qt. 

1.4 Term 

In the rest of this paper, the word “line” means 
a tracker line, and not a vertical graphical line 
(i.e. a row of pixels) or an automation line. 
In cases where we refer to a graphical line, 
the expression “graphical line” will be used in¬ 
stead. In cases where we refer to an automation 
line, the expression “automation line” or “break 
point line” will be used. 

2 Radium Features 
2.1 The Editor 

The image below shows a Block (the name of 
patterns in Radium). 6 From left to right, we 
see a vertical slider, line numbers (12-29), a 
green and blue area indicating tempo, an LPB 
track (Lines Per Beat), a BPM track (Beats 
Per Minute), a RelTempo track (for doing time- 
varying tempo changes), plus two sound tracks; 
a drum loop track and a bass track: 



denoted with graphics too, using vertical lines, 
but text is clearer and more accurate. 

2.1.1 Editor Elements 

• Audio waveforms are shown in the tracks: 
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using break point curves. The audio 
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• Time-varying tempo changes (ac- 
celerando/ritardando) are defined with 
break point lines. The audio waveforms 
are updated in real time: 
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• Effects, e.g. reverb or chorus, are also de¬ 
fined with break point lines: 



We also see that text is used to denote pitch 
(“D-4”, “D#3”, and “C-3”), while graphical 
break point lines are used to define tempo 
changes and effect automation. Pitch can be 

6 The word “Block” comes from Octamed (http://en. 
wikipedia.org/wiki/Octamed). I think Block is a better 
name than Pattern, at least in Radium where events can 
be placed freely and doesn’t have to follow a pattern. 


...And so are time-varying pitch changes 
(glissando’s): 
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• Pitch can be defined with unlimited preci¬ 
sion. The pitch below is placed 82 cents 
above C sharp at octave 4: 



• Lines can be split. Splitting is essentially 
just a way to zoom in on one line so that 
you have more space to edit, but it can also 
be used to define measures. Furthermore, 
splitted lines can themselves be splitted, 
those lines again can also be splitted, and 
so forth: 



• Updating the graphics too often can be tir¬ 
ing for the eyes. The SPS option (Scrolls 
Per Second) sets a limitation on the num¬ 
ber of updates per second. SPS is an ef¬ 
fective way to make the viewing more plea¬ 
surable when not using smooth scrolling. 



a “diode” that lights up when receiving note 
events: 



The green lines show connections for note 
events, such as Note On, Note Off, Note vol¬ 
ume changes, and Note pitch changes. 

The other connections (those painted in a 
color that resembles mortar 7 ) show audio con¬ 
nections. 

2.2.1 Separate channel routing 

In the Mixer GUI, an audio connection sends 
all channels from one sound object to another. 
In order to (for instance) send only left channel, 
or only receive right channel, the audio connec¬ 
tions must be routed through special channel 
routing objects. 

The idea is that it’s faster to use a little bit 
more time to route channels separately when 
necessary, than to always connect every channel 
manually. 


• The velocity of new notes can be set using 
a random walk algorithm (drunk velocity). 
The algorithm tries to simulate how a mu¬ 
sician varies volume while playing: 



• All editing operations are undoable and re- 
doable. The number of undoes is limited 
by system memory. 

2.2 The Modular Mixer 

The modular mixer provides a graphical inter¬ 
face to route note events and audio signals be¬ 
tween sound objects. 

The role of a sound object is to produce audio, 
receive audio, produce note events, receive note 
events, or any combination of those. 

Inside each sound object in the Mixer GUI, 
there is a volume slider, a mute button, a bypass 
button, YU meters (one for each channel), and 


2.3 Instruments and Audio Effects 

2.3.1 Sampler Instrument 

This instrument can play: 1) Normal sound- 
files, 8 2) Fasttracker instruments, 9 or 3) Sound- 
fonts [Rossum and Joint, 1995]: 



7 “Name that color”: http://chir.ag/projects/ 

name-that-color/#594C5B 

8 All formats supported by libsndfile: http://www. 
mega-nerd.com/libsndfile/ 

9 Text files describing the Fasttracker 
“XI” instrument format: 1) “XI format description” by 
“KB / The Obsessed Maniacs / Reflex”, 2) “The XM 
module format description for XM files version $0104” 
by “Mr.H of Triton” in 1994. 
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2.3.2 VST Plugins and instruments 

Native VST plugins and instruments are sup¬ 
ported on Linux, OSX and Windows: 

1 -gib- - 7 1 1 




2.3.3 Pure Data (Pd) 

Pd processes can be inserted anywhere in the 
sound graph. The Pd GUI is opened by double 
clicking the sound object. There is no limitation 
on the number of simultaneous instances: 
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Custom Pd controllers make it possible to 
control Pd from Radium, and to control Ra¬ 
dium from Pd. The Pd controllers appear as 
Radium effects, similarly to “Max for Live”: 10 



2.3.4 STK Instruments 

Radium includes 20 STK instruments doing 
physical modeling [Cook and Scavone, 1999]. 
These are written by Romain Michon in the 
Faust language [Michon and Smith, 2011]. 
Michon’s instruments have been slightly mod¬ 
ified to be used as instruments in Radium. 

1() https : //www. ableton.com/en/live/max-for-live/ 


2.3.5 Zita Reverb 

Fons Adriaensen’s “Zita Revl” reverb (Zita Re¬ 
verb), 11 implemented by Julius O. Smith III in 
Faust [Smith, 2012]. 12 Zita Reverb is also 
used as the default reverb when creating a new 

1 “3 

song. 

2.3.6 Multiband Compressor 

A multiband compressor. The DSP is imple¬ 
mented in Faust by using components written 
by Julius O. Smith III [Smith, 2012].: Compres¬ 
sor, lookahead limiter, bandsplit, and smooth¬ 
ing. 

2.3.7 Other Instruments and Audio 
Effects 

a) LADSPA plugins. Richard Furse’s Linux Au¬ 
dio Developer’s Simple Plugin API. b) Maarten 
de Boer’s multitap delay “Tapiir” [De Boer, 
2001], implemented by Yann Orlarey in Faust, 
c) A Fluidsynth instrument, using libflu- 
idsynth. 14 d) Sound objects to send or receive 
audio to and from jack clients, e) A pipe ob¬ 
ject. f) Channel routing objects (section 2.2.1). 
g) MIDI output. 

2.4 Instrument and Plug-in 
Configuration Widget 

Sound goes through five parts in the instru¬ 
ment and plugin-in configuration widget. From 
left to right in the picture below, we see: 
1) A Note Duplicator, 2) An automatically cre¬ 
ated Plugin/Instrument GUI, 3) A Compressor, 
4) An Equalizer, 5) Settings for dry/wet, pan¬ 
ning, stereo width, reverb, chorus, and output 
volume: 


T _ 




Bow**™ too* 

V:Tf r=: - 








ml — 



1) Instruments can play several note events 
when a note is played, using the note du¬ 
plicator. This is convenient to, for instance, 
double the bass, or add a simple echo. Up 
to six notes can be played for each incom¬ 
ing note event, and for each of those six 
notes, the user can specify values for transpo¬ 
sition, volume change, delay, and duration. If 

11 http://kokkinizita.linuxaudio.org/linuxaudio/ 
zita-revl-doc/quickguide.html 

nttps://ccrma.Stanford.edu/~j os/Reverb/Zita_Revl_ 
Reverberator.html 

13 The “Calf Multichorus” LADSPA plugin (written by 
Krzysztof Foltman) is used as the default Chorus effect. 

14 ii 

http://www.fluidsynth.org 
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you need more than six notes, you can connect 
the sound object to yet another sound object to 
duplicate the notes further: 


Guitar T 


0 0 0.0 0.0 

j -12 -11 0.0 0.0 

19 -12 0.0 90.0 [ 

24 -14 0.0 0.0 

j -12 -16 0.0 50.0 [ 

j -24 -17 0.0 50.0 

In:-4.71 dB j 


2) Sliders and buttons are automatically created 
for all instruments and plug-ins, based on the 
controllers they provide: 


T 

Lads pa: Calf MultiChorus LADSPA 

Ins: 2. Outs: 2. Load Save Reset 

Random 


).0 

Minimum delay: 4.995743 

Amount: 2.000000 


Thres 

-20.0c 

u> 

Modulation depth: 5.992390 

Dry Amount: 0.000000 

1 

>.0 

Modulation rate: 0.499701 

Center Frql: 97.793304 

| 

1 

>.0 

Stereo phase: 180.000000 

Center Frq 2:4999.450195 

$ 

< 


>.0 

Voices: 3 

Q: 0.125000 

1 

■ 

>.0 

Inter-voice phase: 63.972000 


Attac 

Relei 


3) The compressor has a novel interface which 
tries to show more intuitively how the sound is 
squashed together. The DSP code is written in 
Faust by Julius O. Smith III [Smith, 2012]. 


Enable 



Threshold: Ratio: 5.1:1 Makeup Gain: 

-32.9dB +15.5dB 



Attack: 25.82rm 
Release: 11.27ms 


LsF.: 399.1 Hz 
LsL.:+O.OOdB 


Eq2F.: 1000.0 Hz 
Eq2L: +0.00 dB 
HsF.: 1499.3 Hz 
Hi L.: +0.00 dB 


2.5 Common Music Notation 

Scores can be generated from Radium hies 
automatically with Bill Schottstaedt’s nota¬ 
tion software Common Music Notation (CMN) 
[Schottstaedt, 1997]. 15 The generated scores 
can be further tweaked in CMN, either by edit¬ 
ing the generated CMN code, or by writing code 
that further modifies the CMN code. The latter 


15 


https://ccrma.Stanford.edu/software/cmn/ 


technique is used to generate this score: 



3 Implementation Details 

Radium is mainly written in C and C++. Some 
code is also written in Python, Faust [Orlarey 
et ah, 2009] and Scheme. 16 

3.1 Painting the Editor 

The visible part of the editor is painted line 
by line to a backbuffer. When the editor is 
scrolling, we just copy corresponding tracker 
lines from the backbuffer into the screen. When 
a tracker line is not visible anymore, it is marked 
as free, and available for painting a new line. 
This way, we don’t have to repaint everything 
for every update or scroll the screen or the back- 
buffer. 

Unfortunately, this strategy causes the order 
of the lines in the backbuffer not to be chrono¬ 
logical (newer lines often appear below older 
lines). Non-chronological order makes it im¬ 
possible to paint graphical elements that span 
several lines in one operation. This limita¬ 
tion causes breakpoint boxes to be squashed 
up against the ceiling and floor of a tracker 
line, 17 and automation lines to be slightly not 
quite connected (or too much connected) be¬ 
cause of anti-aliasing artifacts. 18 Scrolling the 
backbuffer 19 would not solve the problem either 
since graphical elements can start before the vis¬ 
ible area, or end after the visible area. There¬ 
fore, at least some graphical elements has to be 
painted in several operations anyway. 

An alternative solution that would solve the 
graphical problems, is to make the backbuffer 
big enough to contain the complete block. But 
this solution could occupy too much memory. 20 

16 Using the Guile interpreter 

17 This effect probably looks more like a feature, 

18 while this effect probably looks more like a bug. 

19 or modifying Qt so that the underlying coordinate 
system would match the order of the lines in the back- 
buffer 

20 There are of course other solutions as well that would 
solve the graphical problems while still keeping the back- 
buffer, but I think they would complicate the code too 
much to be worth the effort. 
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However, since today’s desktop computers 
(2014) seems fast enough to just repaint the 
screen when necessary, the strategy of painting 
tracker lines one by one in a backbuffer will, al¬ 
beit so efficient that it made the program usable 
on hardware from 1992, 21 probably be removed 
in the near future. The advantage of not us¬ 
ing a custom backbuffer is simpler code and less 
graphical artifacts, plus that it is simpler to add 
new graphical features when graphical opera¬ 
tions are not bounded to be performed within 
tracker lines. 

3.2 Smooth scrolling 

Music trackers have traditionally updated the 
screen only when the current tracker line 
changes (i.e. scrolling line by line). By updat¬ 
ing the screen at each vertical blank instead, we 
get smooth scrolling. 

Smooth scrolling looks amazing compared to 
scrolling line by line, but perhaps more impor¬ 
tantly is that smooth scrolling seems signifi¬ 
cantly less tiring for the eyes. 

3.2.1 Render using the CPU 

The first attempt to achieve smooth scrolling 
was to make Radium render the screen by 
copying line by line from the backbuffer at 
each vertical blank. To achieve sub-pixel accu¬ 
racy, all painting operations on the backbuffer 
were performed n times, painting to n different 
back buffers, where each backbuffer was slightly 
skewed to the next one, all within the span of 
one pixel in the vertical direction. A good value 
for n would be at least 4. 

One problem with this attempt was that the 
amount of time to render a frame varied a bit, 
and it was easy to lose the vertical blank dead¬ 
line and get a frame glitch. The usual vertical 
blank period for an LCD screen is 16+1 ms, so 
we don’t have much time, and we can’t trust 
the OS to wake us up soon enough if the cur¬ 
rent process for some reason has yielded in the 
middle of rendering. 

A graphical glitch is very apparent when the 
whole screen moves in one direction at a con¬ 
stant speed, so to avoid frame glitches, Radium 
rendered frames in a separate thread and put 
them on a ringbuffer which the main thread 
would read from. 22 If a single frame took more 
time to render than 16+| ms, we still avoided 

21 Amiga 1200 

22 This strategy is similar to how we reliably get sound 
in real time from a non-deterministic source, for instance 
a hard drive. 


a glitch if the average rendering time was less 
than 16+| ms. 

However, this strategy didn’t play very well 
with the current painting system (i.e. the code 
became very complicated), plus that it had a 
quite high CPU usage (which also made it more 
prone to frame glitches), so it was abandoned. 

3.2.2 Render using the GPU 

A more successful attempt at achieving smooth 
scrolling has been to use OpenGL in 2D mode. 
By letting the GPU repaint everything at each 
vertical blank, we achieve both smooth scrolling 
and a very low CPU usage. Another advantage 
is significantly smaller and simpler code since 
we don’t use the type of backbuffer described in 
section 3.1 plus that scrolling is only a matter of 
sending updated y coordinates to OpenGL for 
the graphical objects. 

This code is currently under development and 
should replace the current system soon. 

3.3 Embedding Pd 

Radium uses Peter Brinkmann’s libpd 23 as basis 
to embed Pd. 

Libpd is a thin layer of code that makes Pd 
into a library [Brinkmann et ah, 2011]. Libpd 
doesn’t include the Pd GUI, and it has some 
other limitations as well, so a “Radium fork” 
of libpd has been made for including features 
needed by Radium. 24 

The first modification was to re-add the GUI 
and create an API to control it. Several other 
enhancements and required modifications fol¬ 
lowed, such as loading and saving patches and 
adding a void* argument to the midi functions. 

3.3.1 Libpds (libpd with an extra ’s’) 
However, the biggest challenge for using libpd is 
that only one Pd instance can run in a process 
simultaneously. With only one instance, you 
can’t send sound from one patch to another in 
the Radium mixer (at least not if there is a non- 
pd sound object in the middle of those two). Or, 
for that matter, you can’t make a LADSPA or 
VST plugin out of a Pd patch. 

To circumvent this limitation, an additional 
library called libpds has been added to the Ra¬ 
dium fork of libpd. Libpds makes it possible 
to load several Pd instances and communicate 
with them separately. Libpds has almost the 
same API as libpd, except that most functions 
take an additional “pd instance” parameter. 

2 i http://libpd.cc 

24 http://github.com/kmatheussen/libpd 
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Libpds works by dynamically loading a new 
libpd library file for each new Pd instance. 
To avoid symbol clash for the global variables 
between the various Pd instances, dlopen is 
called with the RTLD_LOCAL flag when open¬ 
ing “libpd.so”. The RTLD_LOCAL flag pre¬ 
vents symbols from being shared globally. 

Unfortunately, this behavior causes problems 
when loading Pd externals (i.e. plugins which 
are loaded during runtime). Pd externals re¬ 
quire access to functions and global variables 
provided by Pd, but since Pel doesn’t share its 
symbols globally, the externals fail to load. 

The selected solution for the problem is to 
statically link the most common Pd exter¬ 
nals into libpd. 921 externals are currently 
included, and among them are most of the 
externals distributed with the Pd distribution 
Pd-Extended. 2 ^ In order to compile that many 
externals without manually writing a large 
Makefile, a script recursively scans a list of di¬ 
rectories and compiles all externals it can find. 

A slightly simpler way to load externals would 
be to link the Pd externals directly (i.e. instead 
of recompiling), but using a “.so” file as a static 
library does not work. 

It is likely that there are better ways to sup¬ 
port externals, such as implementing a new dy¬ 
namic linking system, but the current solution 
seems to work well for now 

3.4 Garbage collection 

Radium has from the start used Hans Boehm’s 
garbage collector for C and C++ as mem¬ 
ory manager (BDW-GC) [Boehm and Weiser, 
1988]. It is not necessary to free memory manu¬ 
ally when using a garbage collector, so Radium 
has fewer lines of code, and most likely fewer 
bugs, because of this choice. 

There has been no trouble with BDW-GC, 
and Radium has not had memory leaks. It is 
strange that BDW-GC is not used in most large 
programs written for C or C++. 

4 Related software and how their 
features compare to Radium 

4.1 Jeskola Buzz 

Jeskola Buzz 26 appeared in 1997-1998. 2 ' 
Jeskola Buzz was probably the first tracker 
with a modular mixer. The modular mixer in 
Radium is inspired by the one in Jeskola Buzz, 

25 http://puredata.info/downloads/pd-extended 
b http://www.j eskola.net/buzz/ 

'http://en.wikipedia.org/wiki/Jeskola_Buzz 


but the modular mixer in Jeskola Buzz doesn’t 
support sending note events or sound objects 
with more than two channels. 

4.2 Aodix 

Aodix 28 was released before 2002, but I don’t 
know when. Aodix may have been the first 
tickless tracker, depending on how old Aodix 
is. Tickless means that events are not bounded 
by tracker lines, a feature which is shared with 
Radium. Another feature shared with Radium 
is that you can apparently zoom in and out of 
the patterns. 

4.3 Renoise 

Renoise 29 was released in 2002. Renoise is a 
more traditional tracker than Jeskola Buzz and 
Aodix, but has more features. 

Renoise uses one instrument per track, which 
is similar to Radium, but Renoise lets you orga¬ 
nize tracks further by optionally grouping tracks 
and instruments. For instance by grouping all 
drum tracks or all vocal tracks. Grouping makes 
patterns visually clearer and simpler to navi¬ 
gate and it simplifies adding effects to a group 
of instruments (since they are already grouped). 
Grouping is a feature that is currently missing 
in Radium. 

Renoise also supports effect automation and 
tempo automation, but unlike Radium, the 
graphics is placed horizontally in a separate area 
below the tracks, and not in the tracks them¬ 
selves. 

5 Conclusion 

Radium presented a radical change to the clas¬ 
sical tracker interface when it was released four¬ 
teen years ago. 

The following is a list of larger tracker fea¬ 
tures that first appeared in Radium (at least 
to my knowledge). An appending * means that 
Radium is still the only tracker, or tracker-like, 
program that provides this feature, at least to 
my knowledge: 

a) Smooth scrolling*; b) Limitation on the number of scrolls 
per second*; c) Tickless timing (may have been introduced 
in Aodix before Radium); d) Zoom in/out (may have been 
introduced in Aodix before Radium); e) Waveform data 
visible in tracks; j) The “Radium Compressor” compres¬ 
sor interface*; g) Pitch values shown graphically; h) Tempo 
automation*; i) Effect automation*; j) Volume automation*; 

28 http://www.kvraudio.com/product/ 
aodix-by-arguru-software/details 
http://www.renoise.com 
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k) Pitch automation*; l) Adjustable track widths*; m) Pd 
or Max/MSP integration*; n) Track headers with volume 
control and instrument name*; o) Automatic MIDI pre¬ 
set change when playing note for instrument with different 
preset*; p) Line splitting (including line split splitting, line 
split split splitting, etc.)*; q) Unlimited number of simulta¬ 
neously playing notes per track, and no limitation when they 
are allowed to start and stop playing*; 30 r) Unlimited num¬ 
ber of blocks, tracks and lines; s) Generate scores with CMN*; 
t) Unlimited undo/redo; u) Send pitch change events between 
instruments*; 31 v) Configurable menus*. 

This list of (more or less useful) new features 
shows that Radium has tried to be an innovator 
for tracker software. Radium will try to be an 
innovator in the future as well. 
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Dominique Fober / Albert Graf / Stephane Letz / Yann 
Orlarey / Julius O. Smith III: Faust; Krzysztof Foltman: 
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Soundfont parser in libgig; Giles Hall: The python-midi li¬ 
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Lopo: libsamplerate and libsndfile; Romain Michon: The 
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I also want to especially thank Yann Orlarey 
for creating the Faust programming language 

J1 I.e polyhponic aftertouch for pitch instead of volume 
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Music and Art Programme 


Klangdom Concert I 

Thursday, 01.05.2014, 20:00 h, ZKM_Cube 


Anthony Di Furia: Through the Space of Crying 

The project is based on tin (the chemical element) as a conceptual starting material 
and the analysis of the “tin cry”, the characteristic sound a bar of tin makes when bent 
at room temperature. TIN META-SONIFICATION SYNTH is a software written in Su¬ 
percollider based on two sonification processes: the first is derived from the physical- 
chemical characteristics of tin, the second on the atomic number and atomic radius. 
The variation of pressure and temperature controls in real time the values of density, 
sound velocity, state of matter, boiling point and melting, which controls a first synth. 

The atomic radius and atomic number are the basis of an additive synthesis com¬ 
plex, modulated in frequency, amplitude and phase. The generated sound is spatialized 
with first-order ambisonics (ATK Ambisonic-Toolkit), according to the theory of atomic 
orbitals. 

The composition/improvisation derivative is a journey through the sonic dimen¬ 
sions of tin, it creates a “bond” between the real sound of the tin cry and an imaginary 
soundscape. 

Patrick Hartono: The Complete Series of Kecapi (2012-2013) 

The Complete Series of Kecapi (2012-2013) is a merger of three different compositions 
(.Kecapi I, II, III ) that I did for Gaudeamus Muziekdag (Gaudeamus Jonge Componisten- 
bal) at Rasa theatre Utrecht in January 2014. 

Kecapi is a series of electroacoustic compositions that I started in October 2012, 
and that has been developed into four different versions. The Sound Material of Ke¬ 
capi is basically manipulated recorded sound of Kecapi/Sitar (an Indonesian traditional 
plucked string instrument) that was recorded in Jogjakarta. Each individual version of 
Kecapi has a different approach and concept. 

Kecapi I was selected to be premiered on Sound Gallery During Wocmat 2012, Tai¬ 
wan. Kecapi II was selected to be premiered during the WOCMAT 2013 concert, and 
also selected as finalist of the Taiwan International Electroacoustic Music Award. Ke¬ 
capi III was premiered as part of the Behind The Score Concert at Codarts Rotterdam 
Conservatorium 2013. 

Jose Rafael Subia Valdez: Chiral 

Inspired by the chemical sense of the word, Chiral is a piece that tries to apply some of 
this property of chemical symmetry to music. It seemed interesting to write a piece after 
thinking about the chirality of our hands when playing the piano: some chords can only 
be played by a particular hand, left or right. Furthermore, the relationship between the 
piano and the computer-generated sounds is always arranged symmetrically in time, 
gesture or pitch. Conceptually, symmetry is as well considered in its form and sonority. 
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Chiral uses a Pure Data patch to analyze, process and generate sounds; all audio is pro¬ 
duced in real time. It was composed using the PCSlib during two stages of production; 
a Computer Assisted Composition stage and the programing of the performance patch 
itself. 

Giorgio Klauer: Haar 

Decomposing sound into particles, sensitivity and masking effects in auditory percep¬ 
tion, friction in bowed instruments are the themes getting intertwined in this composi¬ 
tion and signed in the title: Haar (hair), like Pferdehaar, Haarzelle, Alfred Haar. 

The sound actuation model is the bowed instrument’s, yet it has been implemented 
through 50-70 cm long, thick, black human hairs gently rubbed against a moving mag¬ 
net phono cartridge cantilever. The sonic characterization has been afterwards dra¬ 
matically emphasized by means of a granular composition environment programmed 
in sclang. In this implementation, envelope, pitch, spatialization, indexing and fur¬ 
ther grain controls have been imposed by perceptual feature descriptors extracted by 
the very same sounds, with the result of an anamorphic and multidimensional micro¬ 
editing process. 

Louise Harris: sys_ml 

sys_ml is an eight-minute electroacoustic composition realized using systemic, a sys¬ 
tem I constructed for real-time composition, performance and sound spatialisation 
controlled via a physics-based visual environment. In systemic, physics-based algo¬ 
rithms govern the behaviour of objects in a visual system, and the movement of those 
visual objects controls the spatialisation, via vector base amplitude panning, of corre¬ 
sponding sound objects over an 8-channel circular speaker configuration. sys_ml is 
composed from a number of recordings taken from systemic. The sonic material is 
a combination of pre-composed sound objects and real-time synthesized sound. By 
utilizing a physics-based visual system to control the spatialisation of sound, I am ef¬ 
fectively removing decision-making from the spatialisation process. 

Martin Hunninger: Spaces 

Spaces explores the relation of two different sound objects in three different spaces. The 
inspiration to this piece stems from the mathematical concept of topological space, a 
structure that allows one to define notions such as connectedness, continuity, inside, 
outside, openness and closedness. The composition develops a path from the pure ab¬ 
stract space through the inside of an acoustically closed room into an outside scenery. 
The composition is written completely in Supercollider and runs on any up to date 
computer. It features aleatoric elements, thus each performance differs in a subtle way 
from the other, while the overall structure is fixed. 
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Klangdom Concert II 

Friday, 02.05.2014, 20:00 h, ZKM_Cube 

Klangdom concert with productions oftheZKM \ Institute for Music and Acoustics 


Concert “Playroom” 

Friday, 02.05.2014, 22:00 h, ZKM_Music Balcony 


Jurgen Reuter: Random Noise: Concert for Sound Column Four Hands 

Two players give a concert in a competitive manner. They put up and rearrange col¬ 
ored shapes and symbols on an advertising column that slowly rotates. The surface of 
the column is scanned, and a computer program renders the shapes and symbols into 
sound, as they move under a virtual play head cursor that is projected onto the col¬ 
umn. Since the players compete in an uncoordinated fashion rather than cooperate, 
the overall picture grows wild. Both players are struggling to dominate the system by 
putting as much information as possible onto the column. As their competition finally 
results in a big chaos, the overall informational content approaches zero, resulting in 
random noise. 

Bruno Ruviaro & Carr Wilkerson: Vowelscape 1.0 

Vowelscape 1.0 is a collaborative audiovisual performance by Bruno Ruviaro (Santa Clara 
University) and Carr Wilkerson (CCRMA/Stanford). Strangled robotic voices and flick¬ 
ering letters are some of the building blocks of this study on the poetic resonances of 
isolated vowels. 

Mauricio Valdes & Juri Pohleven: DNA Sequencer 

DNA Sequencer is an ongoing project based in Slovenia dedicated to create artistic out¬ 
puts involving realtime sound art generation, video interaction and genetics. Our re¬ 
search is spreading continuously in order to create complex networks of information 
that can be used for our performances. The project is conceived as an inter-media show 
performance presentation in one act without any cuts, consisting of realtime generated 
sound art and video controlled with Pure Data and Max/MSP, all supported by one syn¬ 
thesizer, one prepared electric guitar and one laptop computer. 

The performance begins with a sequence of notes with fully mechanized sonic and 
visual events without human control; events are an exact replica of the translation of the 
genetic code into music and image. Subsequently a certain signal on a DNA sequence 
triggers gradual incorporation of realtime sound events through which the performers 
also reversely affect the visualization encoded by the genome. This starts some sort of 
a dialogue/battle between the concepts arising from the genome’s output and/vs. the 
performers’ improvisation. Accordingly, the piece is a hybrid of improvisation and se¬ 
quencing, which imitates the fluid environment in which the rigid genome exists and 
reflects the epigenetic view of biological systems. Inside an ideal scenario, the events 
that occur live overlap with the sequenced, leaving the resultant sound and image as 
a live expressiveness. The genetic algorithm is eventually dominated and subjected to 
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control parameters that keep the work performance running; it’s participation and con¬ 
trol remains essential for parameterizing certain events of our electronic instruments, 
but without its own sound. The play ends as the musicians decide on stage ... 

Bruno Ruviaro & Juan-Pablo Caceres: Panela de Pressao 

Panela de Pressao is an improvisation over the network with Bruno Ruviaro and Juan- 
Pablo Caceres. Juan-Pablo will be playing live from Santiago, Chile. The two musicians 
started playing together in 2004 when they first met in the United States. After a long 
hiatus, the duo finally resumed playing last year, now mostly through network perfor¬ 
mances using JackTrip. “Panela de Pressao” means “pressure cooker” in Portuguese. 

Louise Harris: intervention:coaction: 

The project is a live, audiovisual, beat-and-noise-based performance work. The inten¬ 
tion is to create a symbiotic system, in which live decision making by the performer im¬ 
pacts on both the audio and visual components of the work but also in which both the 
audio and visual components can interact with one another, causing behaviours that 
are not directly controlled by the system performer. There is also an element of chaotic 
behaviour built into the system, causing unpredictable audio and visual outcomes. 

Malte Steiner: Elektronengehirn - Concert reqPZ 

Audiovisual electroacoustic concert by Malte Steiners project Elektronengehirn. The 
input of piezo contact microphones are analyzed with Pure Data on a Linux laptop 
controlling sound and graphics. The usage is between percussion trigger and pick up, 
sometimes the piezo sound is used directly, in other parts only as control data for syn¬ 
thesis and the visuals. The contact mics are attached to a metal plate which is played 
by strokes and beats with sticks, realizing an expressive performance. 


Klangdom Concert III 

Saturday, 03.05.2014, 20:00 h, ZKM_Cube 


Luis Valdivia: Xaevluox 

The piece was finalized om March 2014, and was realized using Supercollider on Fedora 
19/20. 

Florian Hardieb: Out of the Fridge 

Out of the Fridge was composed as a ballet-insertion for a new staging of Christoph- 
Willibald Gluck’s opera “II Parnaso Confuso”, which was premiered in the Schonbrun- 
ner Schlosstheater in Vienna in 2011. Alienated sounds from a refrigerator, like the 
fridge buzzing, shaking ice cubes or the clicking noise inside the freezer are mainly the 
source material for the piece (in the ballet, the refrigerator was an important part of the 
scenery). All sound processing was realised with the language Csound. The first part 
of the work is about constructing and deconstructing, with a clear harmonic structure 
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and some rhythmic elements. The second part dissolves the harmonic structure, it has 
more confusion and a processed collocation of the material. 

Fernando Lopez-Lezcano: Divertimento de Cocina 

Divertimento de Cocina (Kitchen Divertimento) stages several kitchen scenes, with sounds 
and rhythms layered, controlled and triggered by a live performer. Kitchen utensils are 
mixed, transformed and orchestrated in real-time through a LaunchPad controller and 
a custom set of Supercollider classes and programs. What are initially extremely simple 
rhythms get progressively more complicated as they are layered together in increas¬ 
ingly thicker textures in the initial section of the piece. While the performer walks you 
through different soundscapes, the initial rhythms form the backbone and guide for 
the rest of the piece. The Supercollider program also spatializes all sounds under the 
control of the performer in a 3D soundscape that can be diffused through an arbitrary 
number of speakers (the original sound stream is internally generated in Ambisonics, 
with at least 3rd order periphonic resolution). 

Clemens von Reusner: rooms without walls 

rooms without walls has been composed in 2012 for an 4x4 array of loudspeakers built 
at the Platz der Weltausstellung (Expo 2000) in Hanover, Germany. The arrangement 
of the 16 speakers/light steles in the sculptural appearance is strictly geometric. In an 
abstract way it reminds of geometric spatial divisions in baroque gardens as the can 
still be found in the Hanover Royal Gardens. In the composition each four corners of a 
square define 14 square and overlapping areas of different size and position. Hence 14 
virtual rooms are implemented using Ambisonics as sonic rooms to be equipped with 
different sounds that in each room are moved in individual orbits simultaneously. 

The sound material on which this composition is based has been designed in terms 
of its spectro-morphological development and its structure contrasting with the exist¬ 
ing sound of the public space. The third-order ambisonic spatialization was done with 
Csound. Due to the very unique original setting of the 4x4 loudspeakers, an 8-channel 
concert-version of the piece is played. 

Ali Ostovar: A Thin Light Behind the Fog 

The title of this piece is very descriptive but it is more about sounds, based on spec¬ 
trum and timbre. Physical modeling synthesis, granular synthesis and some other tech¬ 
niques were utilized. 

Bernardo Barros & Mario del Nunzio: Improvisation 
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Sound Night 

Saturday, 03.05.2014, 22:00 h, ZKM_Music Balcony 


Wolfgang Spahn: ENTROPIE 

ENTROPIE is a noise and projection performance. Both, sound and projection are 
based on different analogue and digital machines developed by the artist. Each system 
generates simultaneously structured noise as well as abstract light patterns. 

The invention of moving pictures went along with an artificial separation of sound 
and image. ENTROPIE merges them again. It makes the data stream of a digital projec¬ 
tor audible and gives an audio-visual presentation of the electromagnetic fields of coils 
and motors. 

All hard- and software was developed by the artist as an open hard- /software system 
(http: //www. dernullef f ekt. de). The two cameras and the controlling system run on 
three Raspberry Pi. A Pure Data patch handles the controlling and a Python program 
manages the camera output. 

Renick Bell: Algorave Improvisation 

This performance of improvised programming generates algorave, danceable percus¬ 
sive music emphasizing generative rhythms. The rapidly changing algorithmic bass 
music is intended to stimulate dancing. Using a custom live coding system called Con¬ 
ductive with the Vim text editor and GHCi, the Haskell language interpreter, multi¬ 
ple concurrent processes are used to trigger a SuperCollider-based software sampler 
loaded with thousands of audio samples. At least two methods of rhythm pattern gen¬ 
eration are employed: stochastic methods and L-systems. Patterns from both are then 
processed to generate variations with higher and lower density, which are deliberately 
chosen during the performance. The performance also involves programming to con¬ 
trol other parameters. The programming activity is projected for the audience to see. 

Tiny Boats (Jason Jones & Jesse Crowley): Burn in the Sun 

A song composed by two. Recorded, edited, mixed, and mastered in Linux with Harri¬ 
son Mixbus at Art City Sound in Springville, UT. 

This was the last song we produced for our album ‘The Broken Vessels’. It was cre¬ 
ated specifically to be the first song on the album to set the tone and establish some 
themes and expectations for the rest of the songs. The song itself is a journey from a 
lost, cynical world view into a triumph of thought and hope. It helps to illustrate that 
the interactions we have with people help to shape our way of seeing the world, and 
better understand our place in it. 

Yan Michalevsky: Locum Meum 

Locum Meum is an electronic composition using synthetic sounds and produced us¬ 
ing MuLab DAW. Nevertheless, some melodic parts can be optionally played live and 
were composed with the real instrument constraints in mind. While aiming at the 
House/Techno genre, this composition is very melodic, subjective and manifests the 
personal musical preferences of the composer. Synthetic sound combined with a clas¬ 
sical violin part, and rock drum fills, all of these try to to blur the genre boundaries. 
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Unsound Scientist (Amos Przekaza): Selected Works DJ Set 

Selected Works is a collection of pieces composed mostly during 2013. Consisting of 
about 10 songs, it represents the learning process of using Linux tools for music pro¬ 
duction and compositional exploration in general. All works were made entirely on 
with Linux software, mainly LMMS and Ardour along with ZynaddsubFX, AMsynth, Hy¬ 
drogen, TAL noisemaker, Qtractor, Phasex, Synthvl and other tools packaged with the 
KXstudio distribution. Live instrumentation along with software and hardware synthe¬ 
sizers were used as well. 

Yen Tzu Chang: Self-Luminous 2 - Unbalance 

Self-Luminous 2 is a little bit different from first one: It is much easier to control, not 
heavy, and the design is more organized. I also added a HMC6343, an electronic com¬ 
pass sensor for the Arduino. When I move the instrument to different directions, some 
accidental sound will be effected (e.g., some sound suddenly disappear or are covered 
by other sounds). Sometimes, this is a bit risky ... A part of my performance is im¬ 
promptu. When I play, it is interesting that the performance is severed violently from 
the aAIJselfaAi. The instrument was built using Pure Data and Arduino. 

Vincent Rateau & Daniel Fritzsche: Superdirt 2 

Superdirt 2 - fascinating electro beats mixed in with virtuously performed cello sounds, 
which give a result of a never before achieved danceability! With Ras Tilo at the synthe¬ 
sizers and Kapt’n Dirt with the cello it provides a musical experience which is situated 
between drum’n’bass, jungle, dub, dubstep and even far beyond ... 

Jeremy Jongepier: The Infinite Repeat 

A musician with over 20 years of experience and a computer with Linux. That’s what it 
boils down to. The result: conventional, decent song-writing, with an eclectic tinge be¬ 
cause of the choice to not walk the threaded paths coupled with an auto-didactic back¬ 
ground, an outspoken personal taste and an open-minded world-view. The keyword 
for this year’s Linux Sound Night is danceability. So that is what The Infinite Repeat 
will be bringing on stage, danceable tracks in the best indie-electronic tradition with 
an emphasis on melody and finding the right balance between traditional instruments 
and sounds generated by software running on the OS that has given name to this event. 

Bart Brouns: The WOP machine 

One man, 160 oscillators. An improvised live performance with a realtime synthesizer 
controlled by singing and beat-boxing. The WOP machine is the subject of a workshop 
on May 1st, 14:45 h in the Workshop-space. 

William Light: visinin - Modern Electronic Club Music 

Kentucky-born and Ohio-raised, William Light now lives in Germany, writing software 
and music. With his solo project visinin, he explores the sounds of modern electronic 
music, everything from heavy club beats to relaxed, downtempo soundscapes. 
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Jalcub Pisek: Turbosampler 


The audio-visual aspect of the performance absolutely refutes the illusion that the world 
is showing us a legible face that we need only decipher. This is a game of provisional 
meanings that do not lead anywhere. Synchronized A/V samples are handled not like 
pieces of music, but like potatoes. Everything is controlled by controllers everybody 
knows, and probably owns. Keyboard and mouse are on the table, just the screen has a 
different meaning here. 


Music and Sound Art Installations 

Thursday-Saturday, 01.-03.05.2014, 14:00-18:00 h 


Louise Harris: Indusium 

Indusium is a work-in-progress; an exploration of additive compositional process. Groups 
of material are added to and extended step by step, the visuals reflecting the changing 
timbral colours caused through unexpected crossovers in pitch and rhythm. 

Lightune.G (Miodrag Gladovic & Bojan Gagic): Lighterature Reading: Chapter 12 

Lighterature Reading is an ambient audio/visual luminoacoustic installation. Chapter 
12 consists of nine solar panels that convert light and video projection into sound im¬ 
ages. The composition is seventeen minutes long and is repeated in a loop. Its duration 
was chosen as the ideal length of one side of a vinyl LP record, twelve inch. Besides the 
basic composition, which is fully programmed by authors, it also includes the audience 
intervention with different types of hand lamps, which they can pick at the entrance. 
Each visitor who wishes to intervene in the work can also enter a personal email ad¬ 
dress on the computer near the entrance to get a snapshot of the composition with his 
or her intervention as an audio recording via email. The duration of a snapshot is three 
minutes, which is the ideal duration of one side of a vinyl single, seven inch. 

Hanspeter Portner: CHIMAERA - The Poly-Magneto-Phonic Theremin 

The Chimaera is a touch-less, expressive, network-ready, polyphonic music controller 
released as open source hardware. It is a mixed analog/digital offspring of the theremin: 
An array of analog, linear hall-effect sensors and their vicinity make up a continuous 
two dimensional interaction space. The sensors are excited with neodymium magnets 
worn on fingers. The device continuously tracks and interpolates position and vicin¬ 
ity of multiple present magnets along the sensor array to produce corresponding low- 
latency event signals. Those are encoded as Open Sound Control bundles, transmitted 
via UDP/TCP to a Linux host running a dynamic event dispatcher, translated to musical 
events and finally rendered to audio via Supercollider, Yoshimi and amSynth according 
to ever morphing mappings in correspondence to the visitors input dynamics. Visitors 
are free to interact with and experience the expressiveness of the continuous-pitch in¬ 
strument. 
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Wolfgang Spahn: Bild einer Ausstellung. An Installation of Audio/Visual Interfer¬ 
ences 

100 years ago the Futurist Luigi Russolo introduced the Intonarumori and Bauhaus 
Artist Lazio Moholy-Nagy built his “Light-Space Modulator”. The first apparatus gen¬ 
erates noise and the second generates a moving light/shade pattern. Both aspects were 
part and parcel of the installation Bild einer Ausstellung. A laboratory-like setting func¬ 
tions as the generator of both abstract projections and corresponding noise. In con¬ 
trast to the composition of Modest Mussorgski “Pictures at an exhibition” the sound is 
directly recorded from the picture. 

The “Picture Disc” is a transparent miniature picture based on chemical-magnetic 
experiment with liquid iron. A laser scans optically the rotating picture. After trans¬ 
forming the light beam into an electric signal it is modulated by an electric circuit based 
on an old analog 808 drum machine. A Raspberry Pi camera with a mounted macro 
lense films and magnifies the abstract picture and projects it above the whole instal¬ 
lation. All technology and machines are based on open hard- and software and docu¬ 
mented on the artist’s web side http: /www. dernullef f ect. de. 
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